I'm having a variety of performance issues w/ ES 6.1.1 since upgrading from ES 1.7.5 and I'm trying to break them down and see what I can shake out in terms of a primary cause.
ES 6.1.1
NEST 6.0.1
3 node cluster: 34 indexes, ~1200 shards, ~30M docs, ~98GB total
Windows VMs (56GB of memory w/ 24 GB for ES, 8 cores, 100'sGB available disk)
Frequent Garbage Collection
I see the GC running often and, as far as I can tell, needlessly. With 24GB of memory, the heap usage still rarely exceeds 12GB and in quieter times, will hang around 4-8GB or less used. Also, at one point, the GC spends 25s collecting. Here's a sample:
[2018-04-10T13:35:51,628][INFO ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550827] overhead, spent [515ms] collecting in the last [1s]
[2018-04-10T13:35:52,784][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550828] overhead, spent [679ms] collecting in the last [1.1s]
[2018-04-10T13:35:53,971][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550829] overhead, spent [643ms] collecting in the last [1.1s]
[2018-04-10T13:35:55,081][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550830] overhead, spent [723ms] collecting in the last [1.1s]
[2018-04-10T13:35:56,378][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550831] overhead, spent [837ms] collecting in the last [1.2s]
[2018-04-10T13:35:57,393][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550832] overhead, spent [567ms] collecting in the last [1s]
[2018-04-10T13:36:07,690][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550842] overhead, spent [618ms] collecting in the last [1s]
[2018-04-10T13:36:08,706][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550843] overhead, spent [647ms] collecting in the last [1s]
[2018-04-10T13:36:09,909][INFO ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550844] overhead, spent [572ms] collecting in the last [1.2s]
[2018-04-10T13:36:10,925][INFO ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550845] overhead, spent [420ms] collecting in the last [1s]
[2018-04-10T13:36:13,018][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550846] overhead, spent [958ms] collecting in the last [1.1s]
[2018-04-10T13:36:14,034][INFO ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550847] overhead, spent [527ms] collecting in the last [1.9s]
[2018-04-10T13:36:15,034][INFO ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550848] overhead, spent [465ms] collecting in the last [1s]
[2018-04-10T13:36:16,050][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550849] overhead, spent [565ms] collecting in the last [1s]
[2018-04-10T13:36:17,065][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550850] overhead, spent [650ms] collecting in the last [1s]
[2018-04-10T13:36:18,112][INFO ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550851] overhead, spent [514ms] collecting in the last [1s]
[2018-04-10T13:36:19,128][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550852] overhead, spent [684ms] collecting in the last [1s]
[2018-04-10T13:36:20,690][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550853] overhead, spent [910ms] collecting in the last [1.5s]
[2018-04-10T13:36:21,706][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550854] overhead, spent [552ms] collecting in the last [1s]
[2018-04-10T13:36:22,722][INFO ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550855] overhead, spent [472ms] collecting in the last [1s]
[2018-04-10T13:36:23,815][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550856] overhead, spent [622ms] collecting in the last [1s]
[2018-04-10T13:36:24,831][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550857] overhead, spent [514ms] collecting in the last [1s]
[2018-04-10T13:36:25,847][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550858] overhead, spent [576ms] collecting in the last [1s]
[2018-04-10T13:36:26,862][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550859] overhead, spent [627ms] collecting in the last [1s]
[2018-04-10T13:36:27,909][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550860] overhead, spent [629ms] collecting in the last [1s]
[2018-04-10T13:36:28,925][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550861] overhead, spent [563ms] collecting in the last [1s]
[2018-04-10T13:36:30,081][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550862] overhead, spent [640ms] collecting in the last [1.1s]
[2018-04-10T13:36:31,144][INFO ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550863] overhead, spent [520ms] collecting in the last [1s]
[2018-04-10T13:36:32,425][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550864] overhead, spent [752ms] collecting in the last [1.2s]
[2018-04-10T13:36:33,487][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550865] overhead, spent [541ms] collecting in the last [1s]
[2018-04-10T13:36:58,644][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][old][550866][76] duration [25.1s], collections [2]/[25.1s], total [25.1s]/[57.9s], memory [23.5gb]->[2.8gb]/[23.9gb], all_pools {[young] [436mb]->[105.4mb]/[532.5mb]}{[survivor] [66.5mb]->[0b]/[66.5mb]}{[old] [23gb]->[2.7gb]/[23.3gb]}
[2018-04-10T13:36:58,644][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550866] overhead, spent [25.1s] collecting in the last [25.1s]
[2018-04-10T13:36:59,785][WARN ][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc][550867] overhead, spent [592ms] collecting in the last [1.1s]
This doesn't look normal to me. What should I be looking for in terms of a cause for this behavior?