We recently upgrade a 5.6 cluster to 6.3.1 and started having issues with high heap utilization correlated with Active Search Thread Queue exhaustion/locking/blocking. We are logging queries from kibana and the client side app (still using the 5.6 transport client), but nothing seems to be out of the ordinary. During this state, indexing drops to nothing, all API's except for the health api (which returns green) do not respond. After time (about 30 mins), the cluster updates with the first node with the locked active search as unavailable and the cluster begins to go into yellow/degraded health.
The quickest way for us to remediate the issue is to perform a restart of the ES processes.
The following are the cluster information:
OS: CentOS 6.10
Data Nodes: 10 x 15GB Heap
Master Nodes: 3 x 4GB Heap
Cliennt Nodes: 3 x 2GB Heap
Primary Store Size: ~1TB
Shards: 30 x 3 Replicas
X-Pack: Default Open-Source, with monitoring enabled
Other configuration items: Default
Index Rate is fairly low, about ~500 doc/second.
Heap Utilization spike during Active Search Threads:
Active Search Threads. The downtrend is the beginning of the restart cluster process:
Thanks for any insight into how we can resolve this!