Long GC on Elasticsearch master nodes

Since I upgraded to ES 2.4.2 I have been noticing the master servers falling off and rejoining the cluster due to long GC . They are master only nodes with 16gb memory. I even increased the ES_HEAP_SIZE from 8g to 10g ( ES_HEAP_SIZE=10g ) but still does not help. The cluster has 22 data nodes and 3 master only nodes. Please help

[2017-02-14 02:06:17,113][WARN ][monitor.jvm ] [ElasticSearch-15] [gc][young][24516][88] duration [1s], collections [1]/[1.8s], total [1s]/[8.3s], memory [2.1gb]->[2.3gb]/[7.9gb], all_pools {[young] [6.4mb]->[1016.6kb]/[266.2mb]}{[survivor] [1.3mb]->[33.2mb]/[33.2mb]}{[old] [2.1gb]->[2.3gb]/[7.6gb]}
[2017-02-14 06:06:52,933][WARN ][monitor.jvm ] [ElasticSearch-15] [gc][old][38932][3] duration [12.9s], collections [1]/[13s], total [12.9s]/[13s], memory [7.5gb]->[3.4gb]/[7.9gb], all_pools {[young] [1.5mb]->[1.1mb]/[266.2mb]}{[survivor] [33.2mb]->[0b]/[33.2mb]}{[old] [7.4gb]->[3.4gb]/[7.6gb]}
[2017-02-14 08:06:24,741][WARN ][monitor.jvm ] [ElasticSearch-15] [gc][young][46101][253] duration [1.1s], collections [1]/[1.9s], total [1.1s]/[35.8s], memory [5.2gb]->[5.4gb]/[7.9gb], all_pools {[young] [33mb]->[2.1mb]/[266.2mb]}{[survivor] [1.5mb]->[33.2mb]/[33.2mb]}{[old] [5.2gb]->[5.4gb]/[7.6gb]}
[2017-02-14 08:06:42,841][WARN ][monitor.jvm ] [ElasticSearch-15] [gc][old][46107][4] duration [10.5s], collections [1]/[11.4s], total [10.5s]/[23.6s], memory [7.3gb]->[2.5gb]/[7.9gb], all_pools {[young] [853.7kb]->[6.8mb]/[266.2mb]}{[survivor

Do you have a large cluster state and/or frequent cluster state updates (eg. dynamic field updates, index creation)? If lots of garbage is produced, frequent/long garbage collections are hard to avoid.

Maybe also check that swapping is disabled, having some of the process memory paged to disk makes garbage collection slow.

Increasing the heap size makes major garbage collections less likely to happen, but it also has the side effect that garbage collections are more costly since there are more objects to collect. So maybe it would actually help to reduce the heap size instead of increasing it.

It is a 22 data nodes ( aws r3.2xlarge 63gb memory/ each) with 3 master servers/non-data (m4.xlarge 16gb)
Why would the masters servers consume so much memory since they are meant only for cluster/admin operations?
swapping has been disabled on all nodes
free -m
total used free shared buffers cached
Mem: 16078 13047 3031 0 237 1591
-/+ buffers/cache: 11218 4860
Swap: 0 0 0

I will reduce the HEAP size on the master servers

Indeed, they are supposed to require little memory. Unless there is a high frequency of updates on a large cluster state.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.