Does the time logged in slow logs include the time it spent in the search queue on the node?

Priyanka_Raju · September 3, 2020, 4:49pm

Hi,

We are observing that there are a few nodes in our cluster seem to have a spike in search queue items (correlated with a spike in latency). We are relying on slow logs for some part of debugging and I wasn't sure if the took_millis logged in slow logs includes the time a request spent in the queue before execution. Is there anyway to get a breakdown of how much time a request spent in the queue?

We are using ES 5.6

Emanuil · September 3, 2020, 7:42pm

took_millis does not include the time spent in queue before execution. As per the slowlog docs :):

The logging is done on the shard level scope, meaning the execution of a search request within a specific shard. It does not encompass the whole search request, which can be broadcast to several shards in order to execute. Some of the benefits of shard level logging is the association of the actual execution on the specific machine, compared with request level.

Is there anyway to get a breakdown of how much time a request spent in the queue?

I can't think of a way - hopefully another user drops by who knows. However, I don't know how much this is going to help you. The queue would only be full because there are too many requests or they're being processed too slowly. I'd focus effort on resolving those underlying problems. The Profile API may help you here.

Priyanka_Raju · September 3, 2020, 9:10pm

Hi @Emanuil,

Thanks for the response. I have been trying to profile these queries, but the weird thing is, these queries are slow only during certain periods and if I re-run the query now, it executes fine and completes in 1s. That's why I was asking if the slow logs includes the queueing time, I read the documentation of slow logs, which said "execution on specific machine", but search queues are also specific to the machine, so I wasn't sure.

Emanuil · September 4, 2020, 12:47am

I'll check if anyone else knows about profiling the queue :). Anything in your metrics like CPU, disk, network I/O spikes around the time you notice the intermittent slowness?

Priyanka_Raju · September 4, 2020, 9:59pm

Hi @Emanuil,

I definitely noticed CPU usage spike (above 90%). Disk, network and I/O seem about normal.
Atleast, in one one of the hosts I was debugging, thread_pool.management.queue queue also had a spike.

In a few of the boxes I was looking at (haven't checked the config for others), I do notice that during the period when search queue is built up, there is an ongoing merge process happening (merges.currentmetric is 1 or more).

I understand merges can i/o intensive, are they CPU intensive as well? Or is the CPU spike more to do with the searches?

warkolm · September 7, 2020, 12:17am

A spike in IO can impact CPU as well, and vice versa.

When you see the slowness on the node can you check hot_threads?

system · October 5, 2020, 12:18am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tracing query execution? Elasticsearch	7	2705	July 6, 2017
Info on slowlog Elasticsearch	3	346	March 20, 2020
Slow queries in elastic server in slow query log Elasticsearch	1	558	July 6, 2017
Query timings breakdown and performance issues Elasticsearch	4	384	July 6, 2017
Slow Log Duplicates Elasticsearch	4	774	July 6, 2017

Does the time logged in slow logs include the time it spent in the search queue on the node?

Related topics