Frequently gc in one node

I have a cluster with 120 nodes on 60 machines( 2 nodes on 1 machine). the version is 6.1.3.
The cluster do nothing but indexing with a speed 1 million records every second.
Several nodes will be out of memory about 50 minutes after restarted.

[2018-04-23T22:46:57,115][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][982][189] duration [922ms], collections [1]/[1.7s], total [922ms]/[1.8m], memory [16.6gb]->[17.4gb]/[30.8gb], all_pools {[young] [157mb]->[463.7mb]/[1.4gb]}{[survivor] [36.4mb]->[191.3mb]/[191.3mb]}{[old] [16.4gb]->[16.8gb]/[29.1gb]}
[2018-04-23T22:46:57,115][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][982] overhead, spent [922ms] collecting in the last [1.7s]
[2018-04-23T22:46:59,171][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][983][190] duration [937ms], collections [1]/[2s], total [937ms]/[1.8m], memory [17.4gb]->[18.7gb]/[30.8gb], all_pools {[young] [463.7mb]->[1.1gb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [16.8gb]->[17.4gb]/[29.1gb]}
[2018-04-23T22:47:00,451][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][984][191] duration [987ms], collections [1]/[1.2s], total [987ms]/[1.9m], memory [18.7gb]->[18.5gb]/[30.8gb], all_pools {[young] [1.1gb]->[323.9mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [17.4gb]->[18gb]/[29.1gb]}
[2018-04-23T22:47:04,254][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][985][193] duration [2.1s], collections [2]/[3.8s], total [2.1s]/[1.9m], memory [18.5gb]->[20.3gb]/[30.8gb], all_pools {[young] [323.9mb]->[1gb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [18gb]->[19.1gb]/[29.1gb]}
[2018-04-23T22:47:04,254][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][985] overhead, spent [2.1s] collecting in the last [3.8s]
[2018-04-23T22:47:05,879][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][986][194] duration [1.2s], collections [1]/[1.6s], total [1.2s]/[1.9m], memory [20.3gb]->[20gb]/[30.8gb], all_pools {[young] [1gb]->[24.8mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [19.1gb]->[19.8gb]/[29.1gb]}
[2018-04-23T22:47:08,028][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][987][196] duration [1.4s], collections [2]/[1s], total [1.4s]/[1.9m], memory [20gb]->[21.7gb]/[30.8gb], all_pools {[young] [24.8mb]->[24.3mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [19.8gb]->[20.7gb]/[29.1gb]}
[2018-04-23T22:47:10,096][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][988][197] duration [1.2s], collections [1]/[3.2s], total [1.2s]/[2m], memory [21.7gb]->[22.2gb]/[30.8gb], all_pools {[young] [24.3mb]->[728.5mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [20.7gb]->[21.3gb]/[29.1gb]}
[2018-04-23T22:47:11,325][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][989][198] duration [941ms], collections [1]/[1.2s], total [941ms]/[2m], memory [22.2gb]->[22.2gb]/[30.8gb], all_pools {[young] [728.5mb]->[772.2kb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [21.3gb]->[22gb]/[29.1gb]}
[2018-04-23T22:47:13,242][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][990][199] duration [1s], collections [1]/[1.9s], total [1s]/[2m], memory [22.2gb]->[23.5gb]/[30.8gb], all_pools {[young] [772.2kb]->[716.1mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [22gb]->[22.6gb]/[29.1gb]}
[2018-04-23T22:47:14,448][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][991][200] duration [977ms], collections [1]/[1.2s], total [977ms]/[2m], memory [23.5gb]->[23.4gb]/[30.8gb], all_pools {[young] [716.1mb]->[3.4mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [22.6gb]->[23.2gb]/[29.1gb]}
[2018-04-23T22:47:16,289][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][992][201] duration [901ms], collections [1]/[1.8s], total [901ms]/[2m], memory [23.4gb]->[25.1gb]/[30.8gb], all_pools {[young] [3.4mb]->[1gb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [23.2gb]->[23.8gb]/[29.1gb]}
[2018-04-23T22:47:17,718][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][993][202] duration [1s], collections [1]/[1.4s], total [1s]/[2m], memory [25.1gb]->[25.4gb]/[30.8gb], all_pools {[young] [1gb]->[793.2mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [23.8gb]->[24.4gb]/[29.1gb]}
[2018-04-23T22:47:19,104][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][994][203] duration [1.1s], collections [1]/[1.3s], total [1.1s]/[2.1m], memory [25.4gb]->[25.3gb]/[30.8gb], all_pools {[young] [793.2mb]->[9.3mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [24.4gb]->[25.1gb]/[29.1gb]}
[2018-04-23T22:47:19,104][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][994] overhead, spent [1.1s] collecting in the last [1.3s]
[2018-04-23T22:47:21,634][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][995][205] duration [1.7s], collections [2]/[2.5s], total [1.7s]/[2.1m], memory [25.3gb]->[26.1gb]/[30.8gb], all_pools {[young] [9.3mb]->[22mb]/[1.4gb]}{[survivor] [191.3mb]->[168.1mb]/[191.3mb]}{[old] [25.1gb]->[26gb]/[29.1gb]}
[2018-04-23T22:47:23,645][INFO ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][997][206] duration [814ms], collections [1]/[1s], total [814ms]/[2.1m], memory [27.5gb]->[26.6gb]/[30.8gb], all_pools {[young] [1.3gb]->[175.7mb]/[1.4gb]}{[survivor] [168.1mb]->[191.3mb]/[191.3mb]}{[old] [26gb]->[26.3gb]/[29.1gb]}
[2018-04-23T22:47:29,583][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][1002][208] duration [1.1s], collections [1]/[1.2s], total [1.1s]/[2.1m], memory [28.2gb]->[27.4gb]/[30.8gb], all_pools {[young] [1.4gb]->[39.3mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [26.6gb]->[27.2gb]/[29.1gb]}
[2018-04-23T22:47:31,290][WARN ][o.e.m.j.JvmGcMonitorService] [EsNode2] [gc][young][1003][209] duration [1s], collections [1]/[1.7s], total [1s]/[2.1m], memory [27.4gb]->[28.3gb]/[30.8gb], all_pools {[young] [39.3mb]->[251.7mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [27.2gb]->[27.9gb]/[29.1gb]}

Other nodes are normal. And other node on the same machine with the fault node is also normal.
Can any one help me?

from the mat,we can get "64 instances of "io.netty.buffer.PoolArena$HeapArena", loaded by "java.net.FactoryURLClassLoader @ 0x1dfba4288" occupy 1,075,000,320 (96.28%) bytes. ", it seem sth wrong with netty in es.

Could you upgrade to latest 6.2 please? I think this problem has been solved.

@dadoonet
Thank you very much for you suggest.
Would you please show me the reason. Is the problem a bug ? i can not find it in the release notes of 6.2.

@dadoonet
We upgrade to 6.2.4, and found the problem still there . But when we descend the speed, the frequently gc will disappear.

Could you run:

GET /_cat/nodes?v&h=id,r,hc,hm,hp,rc,rm,rp,fdc,fdm,fdp,fm,qcm,rcm,sc,sm,siwm,svmm,sfbm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.