How to Diagnose Search Queue Growth

elm · March 16, 2016, 7:19pm

Hello, I'm trying to diagnose an issue where our search queue seemingly randomly fills up.

The behavior we observe in our monitor is that on one node of our cluster the search queue growth (just one) and after the search thread pool is used up we start getting timeouts of course. There seems to be one query that is blocking the while thing. The only way for us to resolve the problem at the moment is to restart the node.

You can see below the relevant behavior in charts: First the queue size, then the pending cluster tasks (to show that no other operations are blocking or queing up, e.g. index operations or so) and finally the active threads for the search thread pool. The spike at 11 o'clock is the restart of the node.

The log files on all nodes show no entries during an hour before or after the issue until we restarted the node. Only garbage collection events of around 200 -600ms and only one on the relevant node but that is around 20 minutes before the event.

My questions:

how can I debug this as there is no information logged anywhere on a failing or timing out query?
what are possible reasons for this? We don't have dynamic queries or anything similar
can I set a query timeout or clear / reset active searches when this happens to prevent a node restart?

Thanks,
Elmar

Topic		Replies	Views
Search queue rejections climb while search queue stays empty? Elasticsearch	2	1321	December 8, 2017
Elasticsearch Search Threads not making sense Elasticsearch	6	637	March 2, 2018
Hanging active search threads Elasticsearch	1	315	July 13, 2020
Viewing searches/queries in progress? Elasticsearch	2	733	March 20, 2017
Longer search queue when we have low search thread utilization Elasticsearch	3	358	January 13, 2021

How to Diagnose Search Queue Growth

Related topics