Running ES 6.8.4 on a cluster with heavy search activity, for the past few weeks we're seeing nodes go into a continuous garbage collection loop, and eventually crash the ES process.
Is triggered by spike in search activity (from < 100/s to 2000/s). Issue has occurred on each data node of 4 in cluster (also has 2x dedicated coordinating and 3x dedicated master nodes).
Cluster has < 100GB data in primary shards, in 8 indices (90% of that in 2 most heavily searched)
Data nodes did have 31GB heap, of 64GB total. Reduced to 16GB heap to test & issue remains.
Ubuntu 18.04.03 VMs (hyper-v). Timing does coincide with recent increase to 24 core & reducing that back is next thing we'll test, but hard to understand how that might be related.
JVM has been upgraded to 1.8.0_222, OS upgraded, and one of the nodes fully rebuilt to test & issue remains.
Any advice on next steps for troubleshooting/resolving?