Frequently gc in one node

andrewliuxxx · April 24, 2018, 3:03am

I have a cluster with 120 nodes on 60 machines( 2 nodes on 1 machine). the version is 6.1.3.
The cluster do nothing but indexing with a speed 1 million records every second.
Several nodes will be out of memory about 50 minutes after restarted.

[2018-04-23T22:46:57,115][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][982][189] duration [922ms], collections [1]/[1.7s], total [922ms]/[1.8m], memory [16.6gb]->[17.4gb]/[30.8gb], all_pools {[young] [157mb]->[463.7mb]/[1.4gb]}{[survivor] [36.4mb]->[191.3mb]/[191.3mb]}{[old] [16.4gb]->[16.8gb]/[29.1gb]}
[2018-04-23T22:46:57,115][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][982] overhead, spent [922ms] collecting in the last [1.7s]
[2018-04-23T22:46:59,171][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][983][190] duration [937ms], collections [1]/[2s], total [937ms]/[1.8m], memory [17.4gb]->[18.7gb]/[30.8gb], all_pools {[young] [463.7mb]->[1.1gb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [16.8gb]->[17.4gb]/[29.1gb]}
[2018-04-23T22:47:00,451][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][984][191] duration [987ms], collections [1]/[1.2s], total [987ms]/[1.9m], memory [18.7gb]->[18.5gb]/[30.8gb], all_pools {[young] [1.1gb]->[323.9mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [17.4gb]->[18gb]/[29.1gb]}
[2018-04-23T22:47:04,254][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][985][193] duration [2.1s], collections [2]/[3.8s], total [2.1s]/[1.9m], memory [18.5gb]->[20.3gb]/[30.8gb], all_pools {[young] [323.9mb]->[1gb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [18gb]->[19.1gb]/[29.1gb]}
[2018-04-23T22:47:04,254][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][985] overhead, spent [2.1s] collecting in the last [3.8s]
[2018-04-23T22:47:05,879][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][986][194] duration [1.2s], collections [1]/[1.6s], total [1.2s]/[1.9m], memory [20.3gb]->[20gb]/[30.8gb], all_pools {[young] [1gb]->[24.8mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [19.1gb]->[19.8gb]/[29.1gb]}
[2018-04-23T22:47:08,028][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][987][196] duration [1.4s], collections [2]/[1s], total [1.4s]/[1.9m], memory [20gb]->[21.7gb]/[30.8gb], all_pools {[young] [24.8mb]->[24.3mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [19.8gb]->[20.7gb]/[29.1gb]}
[2018-04-23T22:47:10,096][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][988][197] duration [1.2s], collections [1]/[3.2s], total [1.2s]/[2m], memory [21.7gb]->[22.2gb]/[30.8gb], all_pools {[young] [24.3mb]->[728.5mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [20.7gb]->[21.3gb]/[29.1gb]}
[2018-04-23T22:47:11,325][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][989][198] duration [941ms], collections [1]/[1.2s], total [941ms]/[2m], memory [22.2gb]->[22.2gb]/[30.8gb], all_pools {[young] [728.5mb]->[772.2kb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [21.3gb]->[22gb]/[29.1gb]}
[2018-04-23T22:47:13,242][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][990][199] duration [1s], collections [1]/[1.9s], total [1s]/[2m], memory [22.2gb]->[23.5gb]/[30.8gb], all_pools {[young] [772.2kb]->[716.1mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [22gb]->[22.6gb]/[29.1gb]}
[2018-04-23T22:47:14,448][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][991][200] duration [977ms], collections [1]/[1.2s], total [977ms]/[2m], memory [23.5gb]->[23.4gb]/[30.8gb], all_pools {[young] [716.1mb]->[3.4mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [22.6gb]->[23.2gb]/[29.1gb]}
[2018-04-23T22:47:16,289][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][992][201] duration [901ms], collections [1]/[1.8s], total [901ms]/[2m], memory [23.4gb]->[25.1gb]/[30.8gb], all_pools {[young] [3.4mb]->[1gb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [23.2gb]->[23.8gb]/[29.1gb]}
[2018-04-23T22:47:17,718][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][993][202] duration [1s], collections [1]/[1.4s], total [1s]/[2m], memory [25.1gb]->[25.4gb]/[30.8gb], all_pools {[young] [1gb]->[793.2mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [23.8gb]->[24.4gb]/[29.1gb]}
[2018-04-23T22:47:19,104][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][994][203] duration [1.1s], collections [1]/[1.3s], total [1.1s]/[2.1m], memory [25.4gb]->[25.3gb]/[30.8gb], all_pools {[young] [793.2mb]->[9.3mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [24.4gb]->[25.1gb]/[29.1gb]}
[2018-04-23T22:47:19,104][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][994] overhead, spent [1.1s] collecting in the last [1.3s]
[2018-04-23T22:47:21,634][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][995][205] duration [1.7s], collections [2]/[2.5s], total [1.7s]/[2.1m], memory [25.3gb]->[26.1gb]/[30.8gb], all_pools {[young] [9.3mb]->[22mb]/[1.4gb]}{[survivor] [191.3mb]->[168.1mb]/[191.3mb]}{[old] [25.1gb]->[26gb]/[29.1gb]}
[2018-04-23T22:47:23,645][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][997][206] duration [814ms], collections [1]/[1s], total [814ms]/[2.1m], memory [27.5gb]->[26.6gb]/[30.8gb], all_pools {[young] [1.3gb]->[175.7mb]/[1.4gb]}{[survivor] [168.1mb]->[191.3mb]/[191.3mb]}{[old] [26gb]->[26.3gb]/[29.1gb]}
[2018-04-23T22:47:29,583][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][1002][208] duration [1.1s], collections [1]/[1.2s], total [1.1s]/[2.1m], memory [28.2gb]->[27.4gb]/[30.8gb], all_pools {[young] [1.4gb]->[39.3mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [26.6gb]->[27.2gb]/[29.1gb]}
[2018-04-23T22:47:31,290][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][1003][209] duration [1s], collections [1]/[1.7s], total [1s]/[2.1m], memory [27.4gb]->[28.3gb]/[30.8gb], all_pools {[young] [39.3mb]->[251.7mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [27.2gb]->[27.9gb]/[29.1gb]}

Other nodes are normal. And other node on the same machine with the fault node is also normal.
Can any one help me?

andrewliuxxx · April 27, 2018, 3:36am

from the mat,we can get "64 instances of "io.netty.buffer.PoolArena$HeapArena", loaded by "java.net.FactoryURLClassLoader @ 0x1dfba4288" occupy 1,075,000,320 (96.28%) bytes. ", it seem sth wrong with netty in es.

dadoonet · April 27, 2018, 3:50am

Could you upgrade to latest 6.2 please? I think this problem has been solved.

andrewliuxxx · April 27, 2018, 9:19am

@dadoonet
Thank you very much for you suggest.
Would you please show me the reason. Is the problem a bug ? i can not find it in the release notes of 6.2.

andrewliuxxx · April 28, 2018, 8:43am

@dadoonet
We upgrade to 6.2.4, and found the problem still there . But when we descend the speed, the frequently gc will disappear.

dadoonet · April 30, 2018, 6:50am

Could you run:

GET /_cat/nodes?v&h=id,r,hc,hm,hp,rc,rm,rp,fdc,fdm,fdp,fm,qcm,rcm,sc,sm,siwm,svmm,sfbm

system · May 28, 2018, 6:50am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
GC Problem Elasticsearch	3	344	July 6, 2017
Elasticsearch gc overhead Elasticsearch	1	1264	March 23, 2020
Elasticsearch GC timeout on data node Elasticsearch	2	393	August 10, 2021
Gc takes a lot of time Elasticsearch	14	6297	February 5, 2018
Help with GC configuration Elasticsearch	5	652	July 6, 2017

Frequently gc in one node

Related topics