Cluster details :
Elasticsearch version : 6.3.0
Java version : 1.8.0_191
54 data nodes
Each BM is split into 2 VMs. Each VM has configuration : 128 GB RAM, 31 GB Heap, 18 cores
3 master nodes
Jvm options
-Xms31744m
-Xmx31744m
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:MaxNewSize=16168m
It's impacting the performance of the cluster badly.
I tried different memory settings for young generation ranging from 1 GB to 16 of GB heap.
With all the settings, I see garbage collection being triggered every sec
[2019-01-10T11:02:25,733][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][53][6] duration [1s], collections [1]/[1.6s], total [1s]/[2.4s], memory [16.1gb]->[6gb]/[29.4gb], all_pools {[young] [11.6gb]->[617.2mb]/[12.6gb]}{[survivor] [618.9mb]->[1.5gb]/[1.5gb]}{[old] [3.8gb]->[3.8gb]/[15.2gb]}
[2019-01-10T11:02:25,735][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][53] overhead, spent [1s] collecting in the last [1.6s]
[2019-01-10T11:02:33,981][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][59][8] duration [2.6s], collections [2]/[3.2s], total [2.6s]/[5s], memory [16.5gb]->[7.3gb]/[29.4gb], all_pools {[young] [11gb]->[147.3mb]/[12.6gb]}{[survivor] [1.5gb]->[406.2mb]/[1.5gb]}{[old] [3.8gb]->[6.8gb]/[15.2gb]}
[2019-01-10T11:02:33,997][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][59] overhead, spent [2.6s] collecting in the last [3.2s]
[2019-01-10T11:02:46,927][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][71][10] duration [906ms], collections [1]/[1.8s], total [906ms]/[6.1s], memory [16.7gb]->[9.8gb]/[29.4gb], all_pools {[young] [8.4gb]->[103.3mb]/[12.6gb]}{[survivor] [1.3gb]->[1.5gb]/[1.5gb]}{[old] [6.8gb]->[8.1gb]/[15.2gb]}
[2019-01-10T11:02:46,930][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][71] overhead, spent [906ms] collecting in the last [1.8s]
[2019-01-10T11:02:58,339][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][82][11] duration [1.2s], collections [1]/[1.4s], total [1.2s]/[7.4s], memory [21.9gb]->[11gb]/[29.4gb], all_pools {[young] [12.1gb]->[126mb]/[12.6gb]}{[survivor] [1.5gb]->[1.4gb]/[1.5gb]}{[old] [8.1gb]->[9.4gb]/[15.2gb]}
[2019-01-10T11:02:58,341][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][82] overhead, spent [1.2s] collecting in the last [1.4s]
[2019-01-10T11:03:13,347][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][97] overhead, spent [259ms] collecting in the last [1s]
[2019-01-10T11:03:24,163][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][107][13] duration [1.5s], collections [1]/[1.8s], total [1.5s]/[9.1s], memory [22.6gb]->[10.9gb]/[29.4gb], all_pools {[young] [12gb]->[81.5mb]/[12.6gb]}{[survivor] [1.1gb]->[915.9mb]/[1.5gb]}{[old] [9.4gb]->[9.9gb]/[15.2gb]}
[2019-01-10T11:03:24,164][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][107] overhead, spent [1.5s] collecting in the last [1.8s]
[2019-01-10T11:03:31,384][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][114] overhead, spent [399ms] collecting in the last [1.2s]
[2019-01-10T11:04:27,553][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][170] overhead, spent [657ms] collecting in the last [1s]
[2019-01-10T11:04:42,564][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][185] overhead, spent [273ms] collecting in the last [1s]
[2019-01-10T11:04:50,847][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][193][19] duration [1s], collections [1]/[1.2s], total [1s]/[11.8s], memory [23gb]->[10.7gb]/[29.4gb], all_pools {[young] [12.5gb]->[248.6mb]/[12.6gb]}{[survivor] [418.2mb]->[468.9mb]/[1.5gb]}{[old] [10gb]->[10gb]/[15.2gb]}
[2019-01-10T11:04:50,851][WARN ][o.e.m.j.JvmGcMonitorService] [####] [gc][193] overhead, spent [1s] collecting in the last [1.2s]
[2019-01-10T11:05:15,877][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][218] overhead, spent [322ms] collecting in the last [1s]
[2019-01-10T11:05:44,959][INFO ][o.e.m.j.JvmGcMonitorService] [####] [gc][young][247][22] duration [957ms], collections [1]/[1s], total [957ms]/[13.1s], memory [22.9gb]->[10.6gb]/[29.4gb], all_pools {[young] [12.4gb]->[225.1mb]/[12.6gb]}{[survivor] [437.5mb]->[379.6mb]/[1.5gb]}{[old] [10gb]->[10gb]/[15.2gb]}
Kindly suggest what needs to be fixed for better performance