Our cluster has recently hit problems due to nodes spending a large amount of time performing GC on the young space, with this causing high CPU load which in turn makes the cluster unusable for our application searches etc.
It's a 15 node cluster with 64GB RAM (32GB assigned to the JVM) and has 8 cores. It has 13 search threads enabled.
In summary, this problem is being caused by an application performing a large number of term searches with each document that is being returned being ~5KB in size. The cluster can be getting hit with these searches continuously for hours on end. There is no query caching or refresh enabled on the index that is being searched (216,614,659 documents, 15 shards, 2 replicas).
As it stands we've had to disable the application that is making these searches; I've been unable to find any recommendations regarding what we can do to prevent these GC issues. Is there anything that can be tweaked in Elasticsearch settings, or is the application itself going to have to throttle it's search limits?