Elasticsearch garbage collection problem

f26227279 · April 27, 2018, 3:54am

Hi, I have a cluster with four nodes to collect netflow , and every node has 16G RAM. So I set JAVA heap 8G. After working about three or four days the elasticsearch start to garbage collection(about 30 min) ,which cause the ES low performance and drop flow.

this is the log:

[2018-04-26T23:34:29,634][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189285] overhead, spent [548ms] collecting in the last [1.2s]
[2018-04-26T23:34:38,988][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189293][127678] duration [1.9s], collections [1]/[2.3s], total [1.9s]/[1.1h], memory [1.8gb]->[1.5gb]/[7.9gb], all_pools {[young] [362.5mb]->[6.5mb]/[532.5mb]}{[survivor] [66.5mb]->[26.1mb]/[66.5mb]}{[old] [1.4gb]->[1.4gb]/[7.3gb]}
[2018-04-26T23:34:38,988][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189293] overhead, spent [1.9s] collecting in the last [2.3s]
[2018-04-26T23:35:06,062][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189320][127701] duration [776ms], collections [1]/[1s], total [776ms]/[1.1h], memory [2.3gb]->[2gb]/[7.9gb], all_pools {[young] [396.6mb]->[10.6mb]/[532.5mb]}{[survivor] [63.4mb]->[40.3mb]/[66.5mb]}{[old] [1.9gb]->[1.9gb]/[7.3gb]}
[2018-04-26T23:35:06,063][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189320] overhead, spent [776ms] collecting in the last [1s]
[2018-04-26T23:35:08,955][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189322][127703] duration [934ms], collections [1]/[1.8s], total [934ms]/[1.1h], memory [2gb]->[2gb]/[7.9gb], all_pools {[young] [8.3mb]->[1.2mb]/[532.5mb]}{[survivor] [66.5mb]->[24.4mb]/[66.5mb]}{[old] [1.9gb]->[2gb]/[7.3gb]}
[2018-04-26T23:35:08,956][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189322] overhead, spent [934ms] collecting in the last [1.8s]
[2018-04-26T23:36:37,090][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189395][127769] duration [3.6s], collections [1]/[4.5s], total [3.6s]/[1.1h], memory [3.2gb]->[3.2gb]/[7.9gb], all_pools {[young] [31mb]->[1.4mb]/[532.5mb]}{[survivor] [61.7mb]->[65.7mb]/[66.5mb]}{[old] [3.1gb]->[3.1gb]/[7.3gb]}
[2018-04-26T23:36:37,090][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189395] overhead, spent [3.6s] collecting in the last [4.5s]
[2018-04-26T23:36:40,152][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189396][127770] duration [2.2s], collections [1]/[3s], total [2.2s]/[1.1h], memory [3.2gb]->[3.2gb]/[7.9gb], all_pools {[young] [1.4mb]->[4.1mb]/[532.5mb]}{[survivor] [65.7mb]->[31mb]/[66.5mb]}{[old] [3.1gb]->[3.2gb]/[7.3gb]}
[2018-04-26T23:36:40,152][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189396] overhead, spent [2.2s] collecting in the last [3s]
[2018-04-26T23:36:48,155][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189400][127774] duration [4s], collections [1]/[4.9s], total [4s]/[1.1h], memory [3.2gb]->[3.3gb]/[7.9gb], all_pools {[young] [220.9kb]->[18.7mb]/[532.5mb]}{[survivor] [65.9mb]->[50.3mb]/[66.5mb]}{[old] [3.2gb]->[3.2gb]/[7.3gb]}
[2018-04-26T23:37:07,968][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189414] overhead, spent [720ms] collecting in the last [1.1s]
[2018-04-26T23:37:09,969][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189416] overhead, spent [351ms] collecting in the last [1s]
[2018-04-26T23:37:13,613][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189419][127790] duration [961ms], collections [1]/[1.6s], total [961ms]/[1.1h], memory [3.8gb]->[3.6gb]/[7.9gb], all_pools {[young] [220.5mb]->[4.8mb]/[532.5mb]}{[survivor] [53.4mb]->[44.2mb]/[66.5mb]}{[old] [3.5gb]->[3.5gb]/[7.3gb]}
[2018-04-26T23:37:13,614][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189419] overhead, spent [961ms] collecting in the last [1.6s]
[2018-04-26T23:37:16,628][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189422][127792] duration [789ms], collections [1]/[1s], total [789ms]/[1.1h], memory [4gb]->[3.6gb]/[7.9gb], all_pools {[young] [450mb]->[3.3mb]/[532.5mb]}{[survivor] [55.3mb]->[64.4mb]/[66.5mb]}{[old] [3.5gb]->[3.6gb]/[7.3gb]}
[2018-04-26T23:37:16,628][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189422] overhead, spent [789ms] collecting in the last [1s]
[2018-04-26T23:37:19,765][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189424][127793] duration [2s], collections [1]/[2.1s], total [2s]/[1.1h], memory [4.1gb]->[3.6gb]/[7.9gb], all_pools {[young] [479.1mb]->[5.8mb]/[532.5mb]}{[survivor] [64.4mb]->[21.9mb]/[66.5mb]}{[old] [3.6gb]->[3.6gb]/[7.3gb]}
[2018-04-26T23:37:19,765][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189424] overhead, spent [2s] collecting in the last [2.1s]
[2018-04-26T23:37:22,767][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189427] overhead, spent [534ms] collecting in the last [1s]
[2018-04-26T23:37:24,889][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189428][127796] duration [1.4s], collections [1]/[2.1s], total [1.4s]/[1.1h], memory [3.9gb]->[3.7gb]/[7.9gb], all_pools {[young] [164.4mb]->[6.6mb]/[532.5mb]}{[survivor] [40.5mb]->[49.5mb]/[66.5mb]}{[old] [3.7gb]->[3.7gb]/[7.3gb]}
[2018-04-26T23:37:24,889][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189428] overhead, spent [1.4s] collecting in the last [2.1s]
[2018-04-26T23:37:43,410][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189443] overhead, spent [992ms] collecting in the last [1s]
[2018-04-26T23:37:48,582][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][young][189447][127809] duration [1.6s], collections [1]/[2.1s], total [1.6s]/[1.1h], memory [4.2gb]->[4gb]/[7.9gb], all_pools {[young] [253.9mb]->[705.4kb]/[532.5mb]}{[survivor] [36.1mb]->[66.5mb]/[66.5mb]}{[old] [3.9gb]->[3.9gb]/[7.3gb]}
[2018-04-26T23:37:48,582][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189447] overhead, spent [1.6s] collecting in the last [2.1s]
[2018-04-26T23:38:11,798][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189470] overhead, spent [603ms] collecting in the last [1s]
[2018-04-26T23:38:16,160][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189474] overhead, spent [377ms] collecting in the last [1.3s]
[2018-04-26T23:38:30,168][INFO ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189488] overhead, spent [260ms] collecting in the last [1s]
[2018-04-26T23:39:14,001][WARN ][o.e.m.j.JvmGcMonitorService] [ES4] [gc][189529] overhead, spent [3.2s] collecting in the last [3.5s]

the problem puzzle me a lot.
Is there any solution to improve the problem?
thank you in advance

dadoonet · April 27, 2018, 4:06am

Which version?

f26227279 · April 27, 2018, 4:13am

ES:6.2.2

warkolm · April 27, 2018, 4:30am

How many indices and shards?

f26227279 · April 27, 2018, 4:43am

the every node shard is default: 5

dadoonet · April 27, 2018, 4:59am

You can probably reduce a bit the number of shards. As you have here at most 82gb per index, may be 2 or 3 shards would be enough?

It will reduce the pressure on the nodes IMO.

Another thing you can do in the short term is to add a new node if possible.

f26227279 · April 27, 2018, 10:31am

Is the REST API correct to set primary 3 shards?

PUT _template/logstash
{
  "index_patterns": ["logstash-*"],
  "settings": {
    "number_of_shards": 3
  }
}

After running the command, should I restart all the nodes?

thank you very much

dadoonet · April 27, 2018, 10:43am

This looks correct but this will only apply to new indices.
You don't have to restart.

system · May 25, 2018, 10:43am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Garbage Collection in ES Elasticsearch	8	3770	July 6, 2017
Garbage collection Elasticsearch	13	8250	July 6, 2017
Elasticsearch - Garbage Collection Issues Elasticsearch	6	1938	February 27, 2019
Elasticsearch Garbage Collection issue Elasticsearch	13	5027	July 5, 2017
Garbage collector issue on elasticsearch 2.2.4 Elasticsearch	4	433	November 9, 2018

Elasticsearch garbage collection problem

Related topics