Elasticsearch garbage collection problem

Hi, I have a cluster with four nodes to collect netflow , and every node has 16G RAM. So I set JAVA heap 8G. After working about three or four days the elasticsearch start to garbage collection(about 30 min) ,which cause the ES low performance and drop flow.

this is the log:

[2018-04-26T23:34:29,634][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189285] overhead, spent [548ms] collecting in the last [1.2s]
[2018-04-26T23:34:38,988][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189293][127678] duration [1.9s], collections [1]/[2.3s], total [1.9s]/[1.1h], memory [1.8gb]->[1.5gb]/[7.9gb], all_pools {[young] [362.5mb]->[6.5mb]/[532.5mb]}{[survivor] [66.5mb]->[26.1mb]/[66.5mb]}{[old] [1.4gb]->[1.4gb]/[7.3gb]}
[2018-04-26T23:34:38,988][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189293] overhead, spent [1.9s] collecting in the last [2.3s]
[2018-04-26T23:35:06,062][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189320][127701] duration [776ms], collections [1]/[1s], total [776ms]/[1.1h], memory [2.3gb]->[2gb]/[7.9gb], all_pools {[young] [396.6mb]->[10.6mb]/[532.5mb]}{[survivor] [63.4mb]->[40.3mb]/[66.5mb]}{[old] [1.9gb]->[1.9gb]/[7.3gb]}
[2018-04-26T23:35:06,063][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189320] overhead, spent [776ms] collecting in the last [1s]
[2018-04-26T23:35:08,955][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189322][127703] duration [934ms], collections [1]/[1.8s], total [934ms]/[1.1h], memory [2gb]->[2gb]/[7.9gb], all_pools {[young] [8.3mb]->[1.2mb]/[532.5mb]}{[survivor] [66.5mb]->[24.4mb]/[66.5mb]}{[old] [1.9gb]->[2gb]/[7.3gb]}
[2018-04-26T23:35:08,956][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189322] overhead, spent [934ms] collecting in the last [1.8s]
[2018-04-26T23:36:37,090][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189395][127769] duration [3.6s], collections [1]/[4.5s], total [3.6s]/[1.1h], memory [3.2gb]->[3.2gb]/[7.9gb], all_pools {[young] [31mb]->[1.4mb]/[532.5mb]}{[survivor] [61.7mb]->[65.7mb]/[66.5mb]}{[old] [3.1gb]->[3.1gb]/[7.3gb]}
[2018-04-26T23:36:37,090][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189395] overhead, spent [3.6s] collecting in the last [4.5s]
[2018-04-26T23:36:40,152][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189396][127770] duration [2.2s], collections [1]/[3s], total [2.2s]/[1.1h], memory [3.2gb]->[3.2gb]/[7.9gb], all_pools {[young] [1.4mb]->[4.1mb]/[532.5mb]}{[survivor] [65.7mb]->[31mb]/[66.5mb]}{[old] [3.1gb]->[3.2gb]/[7.3gb]}
[2018-04-26T23:36:40,152][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189396] overhead, spent [2.2s] collecting in the last [3s]
[2018-04-26T23:36:48,155][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189400][127774] duration [4s], collections [1]/[4.9s], total [4s]/[1.1h], memory [3.2gb]->[3.3gb]/[7.9gb], all_pools {[young] [220.9kb]->[18.7mb]/[532.5mb]}{[survivor] [65.9mb]->[50.3mb]/[66.5mb]}{[old] [3.2gb]->[3.2gb]/[7.3gb]}
[2018-04-26T23:37:07,968][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189414] overhead, spent [720ms] collecting in the last [1.1s]
[2018-04-26T23:37:09,969][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189416] overhead, spent [351ms] collecting in the last [1s]
[2018-04-26T23:37:13,613][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189419][127790] duration [961ms], collections [1]/[1.6s], total [961ms]/[1.1h], memory [3.8gb]->[3.6gb]/[7.9gb], all_pools {[young] [220.5mb]->[4.8mb]/[532.5mb]}{[survivor] [53.4mb]->[44.2mb]/[66.5mb]}{[old] [3.5gb]->[3.5gb]/[7.3gb]}
[2018-04-26T23:37:13,614][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189419] overhead, spent [961ms] collecting in the last [1.6s]
[2018-04-26T23:37:16,628][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189422][127792] duration [789ms], collections [1]/[1s], total [789ms]/[1.1h], memory [4gb]->[3.6gb]/[7.9gb], all_pools {[young] [450mb]->[3.3mb]/[532.5mb]}{[survivor] [55.3mb]->[64.4mb]/[66.5mb]}{[old] [3.5gb]->[3.6gb]/[7.3gb]}
[2018-04-26T23:37:16,628][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189422] overhead, spent [789ms] collecting in the last [1s]
[2018-04-26T23:37:19,765][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189424][127793] duration [2s], collections [1]/[2.1s], total [2s]/[1.1h], memory [4.1gb]->[3.6gb]/[7.9gb], all_pools {[young] [479.1mb]->[5.8mb]/[532.5mb]}{[survivor] [64.4mb]->[21.9mb]/[66.5mb]}{[old] [3.6gb]->[3.6gb]/[7.3gb]}
[2018-04-26T23:37:19,765][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189424] overhead, spent [2s] collecting in the last [2.1s]
[2018-04-26T23:37:22,767][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189427] overhead, spent [534ms] collecting in the last [1s]
[2018-04-26T23:37:24,889][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189428][127796] duration [1.4s], collections [1]/[2.1s], total [1.4s]/[1.1h], memory [3.9gb]->[3.7gb]/[7.9gb], all_pools {[young] [164.4mb]->[6.6mb]/[532.5mb]}{[survivor] [40.5mb]->[49.5mb]/[66.5mb]}{[old] [3.7gb]->[3.7gb]/[7.3gb]}
[2018-04-26T23:37:24,889][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189428] overhead, spent [1.4s] collecting in the last [2.1s]
[2018-04-26T23:37:43,410][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189443] overhead, spent [992ms] collecting in the last [1s]
[2018-04-26T23:37:48,582][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189447][127809] duration [1.6s], collections [1]/[2.1s], total [1.6s]/[1.1h], memory [4.2gb]->[4gb]/[7.9gb], all_pools {[young] [253.9mb]->[705.4kb]/[532.5mb]}{[survivor] [36.1mb]->[66.5mb]/[66.5mb]}{[old] [3.9gb]->[3.9gb]/[7.3gb]}
[2018-04-26T23:37:48,582][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189447] overhead, spent [1.6s] collecting in the last [2.1s]
[2018-04-26T23:38:11,798][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189470] overhead, spent [603ms] collecting in the last [1s]
[2018-04-26T23:38:16,160][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189474] overhead, spent [377ms] collecting in the last [1.3s]
[2018-04-26T23:38:30,168][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189488] overhead, spent [260ms] collecting in the last [1s]
[2018-04-26T23:39:14,001][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189529] overhead, spent [3.2s] collecting in the last [3.5s]

the problem puzzle me a lot.
Is there any solution to improve the problem?
thank you in advance :slight_smile:

Which version?

ES:6.2.2

How many indices and shards?


the every node shard is default: 5

You can probably reduce a bit the number of shards. As you have here at most 82gb per index, may be 2 or 3 shards would be enough?

It will reduce the pressure on the nodes IMO.

Another thing you can do in the short term is to add a new node if possible.

1 Like

Is the REST API correct to set primary 3 shards?

PUT _template/logstash
{
  "index_patterns": ["logstash-*"],
  "settings": {
    "number_of_shards": 3
  }
}

After running the command, should I restart all the nodes?

thank you very much :slight_smile:

This looks correct but this will only apply to new indices.
You don't have to restart.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.