Cluster freeze

Fabrice_Granatieri · October 1, 2018, 3:14pm

Hi.
Our cluster is:
11 servers with each 128 GB RAM
a total of 3 masters and 33 data with each 19 GB allowed for the heap
So 3 or 4 (for the 3 masters nodes) nodes per server
53 TB of data for over 5 billions of docs in >380 indices but >80% of the docs in only 30 indices.
over 1TB daily ingestion

First thing, all the nodes have a almost 95% ratio of used RAM
even with very few client connections.
Next, whenever we receive more than 25 http requests, every node the cluster reach 100% of used RAM and the cluster freeze :

no monitoring data is visible in Kibana
a very few data keeps ingested
but no OOM or red status for the cluster
we never reach high CPU % usage (max 20%)
we have a maximum of 70 users
Kibana is on a single server with 8Go RAM

We see all the RAM is occupied by FS cache.

Should we :

reduce to 2 nodes / servers ?
CRON a FS cache flush ?

Please advise

UPDATE-----------------------------------------------------------------------------------------------------

After long research we found that the problem was on a specific dashboard.
We search into a group of indices with a wildcard like ourIndices*
The search request for every visualisation scan ALL shards even those not relevant for the passed range date ?

How is it possible ?

warkolm · October 3, 2018, 10:24am

How many shards?

Christian_Dahlqvist · October 3, 2018, 10:32am

Which version of Elasticsearch are you using? Have you optimised your mappings as described here?

Fabrice_Granatieri · October 3, 2018, 10:38am

hi mark
round 5500 shards

warkolm · October 3, 2018, 10:40am

That's the problem then, that's around 2000 per node which is waaaaay too high.

Christian_Dahlqvist · October 3, 2018, 10:48am

As you have 11 servers with 33 data nodes, I think that sounds like a reasonable amount of shards.

warkolm · October 3, 2018, 10:53am

Oh, I added a zero to the end in my brain. Woops!

Fabrice_Granatieri · October 3, 2018, 10:59am

no, we have 33 nodes for (sorry) around 7450 (primary AND replicas) shards
wich makes a 225 shards per node ratio
what is a "good" ratio ?

and no we haven't really optimosed our mapping yet

Le mer. 3 oct. 2018 à 12:51, Mark Walkom elastic@discoursemail.com a écrit :

Christian_Dahlqvist · October 3, 2018, 11:06am

Read this blog post around shards and sharding guidelines.

If you have not optimised your mappings, it is possible that all string fields are mapped as text as well as keyword. This default dual mapping adds a lot of flexibility, but this comes at the cost of increased heap and disk usage. Optimising this can save you a lot of heap and make your cluster run better.

Another way to reduce heap usage can be to force merge indices down to a single segment per shard. This is however very I/O intensive and can affect the performance of the cluster. This should only ever be done for indices no longer being written to.

If you are using an older version of Elasticsearch, the _all field may also be something to look at, as described in this blog post.

Fabrice_Granatieri · October 3, 2018, 12:20pm

Thank you for all your advices.
I'll contact the users to narrow all the fields properties to the tinyest heap cost.

One more information : we only allocated 2 Gb for the masters node heap.
I think it's too little and I want to increase to 4 Gb.

What do you think ?

Christian_Dahlqvist · October 3, 2018, 1:08pm

If you have monitoring installed I would recommend looking at heap usage on master nodes over time. If you are not seeing a nice saw-tooth pattern it may be time to increase the size of the heap.

system · October 31, 2018, 1:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance degrading after a couple of weeks Elasticsearch	7	520	October 30, 2018
Elasticsearch querying is terribly slow Elasticsearch	11	20040	May 19, 2017
Cluster configuration, shards, and replica Elasticsearch	4	394	June 11, 2020
Large Data Set with Low Memory = Frozen Nodes Elasticsearch	2	1067	July 6, 2017
Shards per node, Heap, 100% CPU....help please Elasticsearch	5	400	August 18, 2021

Cluster freeze

Related topics