Hi,
I'm using Elastic V 5.6.3 on Docker I am using the official Elastic image.
I start the process with -Xms31g and -Xmx31g I hava enough RAM on the host machine (256 GB). I have 5TB of data with 1 600 000 000 docs.
On cluster startup I'm getting gc overhead warnings even if the cluster is idle (no on going search), and when I'm trying to do some intensive search request the gc overhead increases to reach 59 seconds !
Here is and extract of the logs (cluster idle) :
[2017-10-16T08:42:30,274][INFO ][o.e.l.LicenseService ] [Node0] license [a6b18c87-24e5-4289-b94f-ec7b12c6926a] mode [basic] - valid
[2017-10-16T08:42:32,880][INFO ][o.e.m.j.JvmGcMonitorService] [Node0] [gc][14] overhead, spent [388ms] collecting in the last [1s]
[2017-10-16T08:42:38,236][INFO ][o.e.m.j.JvmGcMonitorService] [Node0] [gc][young][19][3] duration [982ms], collections [1]/[1.3s], total [982ms]/[1.6s], memory [3.3gb]->[2.2gb]/[30.6gb], all_pools {[young] [2.2gb]->[50.4mb]/[2.4gb]}{[survivor] [316.1mb]->[316.1mb]/[316.1mb]}{[old] [875.6mb]->[1.8gb]/[27.9gb]}
[2017-10-16T08:42:38,237][WARN ][o.e.m.j.JvmGcMonitorService] [Node0] [gc][19] overhead, spent [982ms] collecting in the last [1.3s]
[2017-10-16T08:43:17,726][WARN ][o.e.m.j.JvmGcMonitorService] [Node0] [gc][young][58][4] duration [1.3s], collections [1]/[1.4s], total [1.3s]/[2.9s], memory [4.6gb]->[3gb]/[30.6gb], all_pools {[young] [2.4gb]->[22.9mb]/[2.4gb]}{[survivor] [316.1mb]->[316.1mb]/[316.1mb]}{[old] [1.8gb]->[2.6gb]/[27.9gb]}
[2017-10-16T08:43:17,727][WARN ][o.e.m.j.JvmGcMonitorService] [Node0] [gc][58] overhead, spent [1.3s] collecting in the last [1.4s]
Cluster performing intensive search requests :
[2017-10-16T08:17:14,617][WARN ][o.e.m.j.JvmGcMonitorService] [Node0] [gc][258188] overhead, spent [42.3s] collecting in the last [42.4s]
[2017-10-16T08:18:03,941][WARN ][o.e.m.j.JvmGcMonitorService] [Node0] [gc][old][258189][18] duration [49.2s], collections [1]/[49.3s], total [49.2s]/[12.7m], memory [30.6gb]->[30.6gb]/[30.6gb], all_pools {[young] [2.4gb]->[2.4gb]/[2.4gb]}{[survivor] [306.1mb]->[311.1mb]/[316.1mb]}{[old] [27.9gb]->[27.9gb]/[27.9gb]}
[2017-10-16T08:18:03,941][WARN ][o.e.m.j.JvmGcMonitorService] [Node0] [gc][258189] overhead, spent [49.2s] collecting in the last [49.3s]
[2017-10-16T08:18:55,724][WARN ][o.e.m.j.JvmGcMonitorService] [Node0] [gc][old][258190][19] duration [51.7s], collections [1]/[51.7s], total [51.7s]/[13.5m], memory [30.6gb]->[30.6gb]/[30.6gb], all_pools {[young] [2.4gb]->[2.4gb]/[2.4gb]}{[survivor] [311.1mb]->[309mb]/[316.1mb]}{[old] [27.9gb]->[27.9gb]/[27.9gb]}
[2017-10-16T08:18:55,724][WARN ][o.e.m.j.JvmGcMonitorService] [Node0] [gc][258190] overhead, spent [51.7s] collecting in the last [51.7s]
[2017-10-16T08:19:55,350][WARN ][o.e.m.j.JvmGcMonitorService] [Node0] [gc][old][258191][20] duration [59.5s], collections [1]/[59.6s], total [59.5s]/[14.5m], memory [30.6gb]->[30.6gb]/[30.6gb], all_pools {[young] [2.4gb]->[2.4gb]/[2.4gb]}{[survivor] [309mb]->[308.2mb]/[316.1mb]}{[old] [27.9gb]->[27.9gb]/[27.9gb]}
[2017-10-16T08:19:55,351][WARN ][o.e.m.j.JvmGcMonitorService] [Node0] [gc][258191] overhead, spent [59.5s] collecting in the last [59.6s]
[2017-10-16T08:19:55,372][INFO ][o.e.d.z.ZenDiscovery ] [Node0] master_left [{Node2}{BiklfW9OTk2myZ1vDDwreg}{dQto8egcSDm1AqetpU8NNg}{10.150.232.143}{10.150.232.143:9300}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
Do you have any clue why this is happening ?
Thank you.