How many threads does a typical search use?

I'm trying to figure out if expensive queries are causing our search queues to back up and have some questions related to how elasticsearch counts threads.

When a search request fans over multiple shards, does it use threads from each node's thread pool?

For example, if I have 10 nodes with 1 shard each (10 shards total) and I do a search request, would this search request be reflected as:

  • 10 "active" threads (one for each node/shard)? Or
  • 1 "active" thread for the search request?



Any thoughts on this? We're trying to diagnose occasional search queue spikes.

It would be helpful to be able to know how many threads are available over our whole system and how many threads the slow queries were taking up.

When queries "fan out" to different shards do they also draw from the thread pools on each individual shard?


Each node has its own collection of thread pools, and search activity on a node is performed using threads from that node's search threadpool.

It's typically more like the former than the latter, but it may not be precisely either of these.

Thanks, @DavidTurner.

So IIUC: the "typical" case is to use 1 thread/ a node/shard and those come out of the node/shard's active thread pool.

Are the "non-typical" cases worth considering? Would there ever be more than 1 thread per shard? Is < 1 thread the search cases where requests are not hitting shards?

The general case is fairly complicated, and may differ between versions of Elasticsearch. For instance, searches are divided up into a number of phases, but not all phases run on all nodes. It's hard to know what might be salient to describe here in more detail. Perhaps it would be simpler for you to describe the problem you are investigating instead?

Oh, interesting. We're still on Elasticsearch 5.6 if that helps.

The basic problem we're trying to solve is occasional large spikes in search queue size.

This question was an effort to determine if/how queries in our slow query logs reduce our capacity (and thus cause the search queue to back up).

We believe that our search queue backing up is caused by some combination of:

  • Slow queries saturating thread capacity
  • A high volume of documents indexed (we sometimes see indexing spikes around the time our search queue backs up but not always) taking CPU
  • Indexing expensive (large) documents taking CPU

Any information you to help us narrow down a cause of search queue spikes (or prove/disprove the ideas above) would be appreciated.

Yes, slow searches consume resources that can cause other searches involving the same nodes to be enqueued. Heavy indexing consumes CPU which can slow down searches on those nodes.

I think I would start by obtaining the output of GET _nodes/hot_threads while the cluster is struggling, as this will give us a clearer picture of what it's busy doing. You could also look for correlations between the spikes and the output of GET _nodes/stats, but hot threads would be the first thing I'd look at.

Are there any interesting-looking log messages at around the times of the spikes? For instance do you see any evidence that the nodes are performing more GC than normal?

HI @DavidTurner,

I looked at an example queue spike from last week and didn't find anything too interesting:

  • There was one message about [2018-10-08T20:44:53,129][INFO ][o.e.m.j.JvmGcMonitorService] [] [gc][381638] overhead, spent [296ms] collecting in the last [1s] but that doesn't sound too bad
  • Our ad-hoc logging of _nodes/hot_threads didn't have anything interesting at the time (basically had no results)

We did a large increase of the number of bulk threads being used right before the search queue spike so I'm thinking that could be the cause of this particular spike.

Thank you for your interest/help. If I have more concrete questions/information I'll make a new topic.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.