Cluster freeze

Hi.
Our cluster is:
11 servers with each 128 GB RAM
a total of 3 masters and 33 data with each 19 GB allowed for the heap
So 3 or 4 (for the 3 masters nodes) nodes per server
53 TB of data for over 5 billions of docs in >380 indices but >80% of the docs in only 30 indices.
over 1TB daily ingestion

First thing, all the nodes have a almost 95% ratio of used RAM
even with very few client connections.
Next, whenever we receive more than 25 http requests, every node the cluster reach 100% of used RAM and the cluster freeze :

  • no monitoring data is visible in Kibana
  • a very few data keeps ingested
  • but no OOM or red status for the cluster
  • we never reach high CPU % usage (max 20%)
  • we have a maximum of 70 users
    Kibana is on a single server with 8Go RAM

We see all the RAM is occupied by FS cache.

Should we :

  • reduce to 2 nodes / servers ?
  • CRON a FS cache flush ?

Please advise

UPDATE-----------------------------------------------------------------------------------------------------

After long research we found that the problem was on a specific dashboard.
We search into a group of indices with a wildcard like ourIndices*
The search request for every visualisation scan ALL shards even those not relevant for the passed range date ?

How is it possible ?

How many shards?

Which version of Elasticsearch are you using? Have you optimised your mappings as described here?

hi mark
round 5500 shards

That's the problem then, that's around 2000 per node which is waaaaay too high.

As you have 11 servers with 33 data nodes, I think that sounds like a reasonable amount of shards.

Oh, I added a zero to the end in my brain. Woops!

no, we have 33 nodes for (sorry) around 7450 (primary AND replicas) shards
wich makes a 225 shards per node ratio
what is a "good" ratio ?

and no we haven't really optimosed our mapping yet

Le mer. 3 oct. 2018 à 12:51, Mark Walkom elastic@discoursemail.com a écrit :

Read this blog post around shards and sharding guidelines.

If you have not optimised your mappings, it is possible that all string fields are mapped as text as well as keyword. This default dual mapping adds a lot of flexibility, but this comes at the cost of increased heap and disk usage. Optimising this can save you a lot of heap and make your cluster run better.

Another way to reduce heap usage can be to force merge indices down to a single segment per shard. This is however very I/O intensive and can affect the performance of the cluster. This should only ever be done for indices no longer being written to.

If you are using an older version of Elasticsearch, the _all field may also be something to look at, as described in this blog post.

Thank you for all your advices.
I'll contact the users to narrow all the fields properties to the tinyest heap cost.

One more information : we only allocated 2 Gb for the masters node heap.
I think it's too little and I want to increase to 4 Gb.

What do you think ?

If you have monitoring installed I would recommend looking at heap usage on master nodes over time. If you are not seeing a nice saw-tooth pattern it may be time to increase the size of the heap.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.