Elasticsearch 5.x goes to YELLOW state more frequently due to high GC activity

bsarkar · October 26, 2017, 5:36am

We recently moved from ES 2.x to ES 5.4.1 and are seeing an increased GC activity on the data nodes. We're seeing frequent young GC happening and each cycle taking just less than a second. Due to lots of GC, data nodes are unresponsive to pings by Master node and as a result go out of the cluster only to join the cluster within a few minutes. But the cluster goes to YELLOW state and it takes time to recover the replica shards. One mitigation would be to increase the value of the setting "index.unassigned.node_left.delayed_timeout" but we want to make changes to reduce the GC pressure. This could be due to a lot of factors and we're looking at them one by one. For the purpose of this topic, we want to make sure our GC options are correctly set. For a data node having 28GB RAM, we've set the following options:
-Xms14g
-Xmx14g
-Xmn4g

The last setting sets the size of new heap size to 4 GB. We've used this setting successfully on ES 2.x clusters and have carried on with this setting for ES 5.x clusters as well. With ES 2.x, we were using Java 1.7. With ES 5.x, we're using Java 1.8. My question is, do we need to set this option at all or can rely on the default settings? Also will enabling the GC logging settings commented out in jvm.options file by default help here? Let me know if more data is required.

thiago · October 30, 2017, 7:44am

You should unset the -Xmn option and leave it with default value.

Regarding memory pressure how many shards per node you cluster have? What is the average number of fields per mapping? Also, can you post here one of these log messages that shows young GC?

bsarkar · November 9, 2017, 5:48am

Sorry for the late reply @thiago. Are you suggesting removal of -Xmn option only because it is not a default setting or do you have anything else in mind? I'm asking this because we ran into similar GC issues on ES 2.x clusters long time back when we decided to set -Xmn to roughly 1/4th of memory allocated to heap. That brought stability to our clusters. On moving to ES 5.x, we did not make any change to this setting. Hence, I'm a little skeptical in simply unsetting this option unless something has changed in Java 1.8 which would improve GC with this option unset. Although, we're indexing roughly double the size of documents in ES 5.x compared to that in ES 2.x. That could be a factor as well causing increased GC.

We have around 50-70 shards per data node (including primary and replicas). We have two mappings - one has around 10-15 fields while the other has around 300-500 fields. Each index has only one mapping. Index containing mapping with 100s of fields has very small shards (not exceeding 5-10 GB). Index containing mapping with 10s of fields has huge shards (ranging from 10 to 500 GB). We're working on re-distributing the data so that these shard sizes are reduced.

Find log snippets around the time a data node went out of the cluster and soon joined back but left the cluster state in yellow state below:

https://gist.github.com/bittusarkar/996afb85474710dd38d36f96d211ccf6

bsarkar · November 9, 2017, 6:10am

Please find elasticsearch.yml and jvm.options files of the data node if interested at https://gist.github.com/bittusarkar/5c11174fe37db40687541b6975f3bd31

bsarkar · November 24, 2017, 7:23am

@thiago Do you have enough data to answer my queries?

thiago · November 24, 2017, 5:27pm

Yes, I recommended removing the -Xmn setting because we don't fully test with that setting. But maybe @jasontedor may have a more deep explanation.

system · December 22, 2017, 5:27pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Continous GC on Master Node Elasticsearch	7	864	October 4, 2018
ElasticSearch gc performance on cluster Elasticsearch	3	651	July 5, 2017
Please suggest performance settings for my ElasticSearch cluster Elasticsearch	21	2467	July 5, 2017
Elasticsearch nodes doing young generation gc very frequently Elasticsearch	7	2725	February 11, 2019
GC problems in ES 1.4.4 Elasticsearch	5	623	July 5, 2017

Elasticsearch 5.x goes to YELLOW state more frequently due to high GC activity

Related topics