It has been long since we're facing ES cluster down issues regularly generally when Garbage collection initiates, although data size is relatively small and enough memory is available.
ES Setup:
ES version: 2.0.0
Cluster size:2 nodes
Memory:32 GB
Config:
ES Heap Size: 50% of RAM
index.number_of_shards: 2
index.number_of_replicas: 1
indices.fielddata.cache.size: 20%
bootstrap.mlockall: true
Heap % used is not consistently > 75%.
Data:
Total no of open indices: 11
Data size 6 GB
[2016-02-08 16:34:22,033][WARN ][monitor.jvm ] [elastic-node-1] [gc][old][1913][491] duration [35.7s], collections [1]/[35.8s], total [35.7s]/[3.1h], memory [15.9gb]->[15.9gb]/[15.9gb], all_pools {[young] [266.2mb]->[266.2mb]/[266.2mb]}{[survivor] [221.7kb]->[5.2mb]/[33.2mb]}{[old] [15.6gb]->[15.6gb]/[15.6gb]}
[2016-02-08 16:34:46,383][WARN ][monitor.jvm ] [elastic-node-1] [gc][old][1914][492] duration [24.2s], collections [1]/[24.3s], total [24.2s]/[3.1h], memory [15.9gb]->[15.9gb]/[15.9gb], all_pools {[young] [266.2mb]->[242.4mb]/[266.2mb]}{[survivor] [5.2mb]->[0b]/[33.2mb]}{[old] [15.6gb]->[15.6gb]/[15.6gb]}
[2016-02-08 16:35:20,045][WARN ][monitor.jvm ] [elastic-node-1] [gc][old][1915][493] duration [33.4s], collections [1]/[33.6s], total [33.4s]/[3.1h], memory [15.9gb]->[15.9gb]/[15.9gb], all_pools {[young] [242.4mb]->[266.2mb]/[266.2mb]}{[survivor] [0b]->[268.9kb]/[33.2mb]}{[old] [15.6gb]->[15.6gb]/[15.6gb]}
[2016-02-08 16:36:16,011][WARN ][monitor.jvm ] [elastic-node-1] [gc][old][1917][495] duration [33.8s], collections [1]/[33.9s], total [33.8s]/[3.1h], memory [15.9gb]->[15.9gb]/[15.9gb], all_pools {[young] [266.2mb]->[266.2mb]/[266.2mb]}{[survivor] [17.1mb]->[26.5mb]/[33.2mb]}{[old] [15.6gb]->[15.6gb]/[15.6gb]}
[2016-02-08 16:36:36,249][WARN ][monitor.jvm ] [elastic-node-1] [gc][old][1918][496] duration [20.2s], collections [1]/[20.2s], total [20.2s]/[3.1h], memory [15.9gb]->[15.9gb]/[15.9gb], all_pools {[young] [266.2mb]->[266.2mb]/[266.2mb]}{[survivor] [26.5mb]->[29.6mb]/[33.2mb]}{[old] [15.6gb]->[15.6gb]/[15.6gb]}
[2016-02-08 16:37:08,229][WARN ][monitor.jvm]
We regularly close index (one per day) to save on memory. Keeping only 7 days old data where each index is 500-800MB in size.
Questions:
-
Why do these long-running GC start although there is enough free memory available. Considering 6GB data size?
-
Is there any additional configurations required to avoid this stop the world situation?
Any help will be appreciated.