Thread pool queue size suddenly spikes to thousands

Every once in a while, thread pool queue size suddenly spike to thousands

When this happens, users start to notice that kibana becomes pretty much unresponsive, for minutes. In the above case, it became unresponsive for about 20 minutes.

How do we debug and improve this? Any suggestions?

Is there anything in the logs around that time that might correlate? Can you run the hot threads API when this happens to see what is going on? What is the specification of the cluster? Which version of Elasticsearch are you using?

We will try the hot threads API when this happens next. ES version is 7.10
Cluster is running 15 data nodes with 10TB disks and is ingesting logs from a lot of backend services.

What is the full output of the cluster stats API?

7.10 is quite old, you should upgrade as 7.15 is latest.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.