One query thread per Shard?

Bertrand · January 16, 2017, 2:12pm

When a query hits a shard, is it true it is handled by a single thread ?
If so, is it worth to assign as many shard per cpu core on a node to increase concurrency and reduce the overall query execution time ?
Is such optimisation a best practice ?

Christian_Dahlqvist · January 16, 2017, 3:52pm

Yes, the processing of a query is single threaded against each shard. Multiple queries can however be run concurrently against the same shard, so assuming you have more than one concurrent query, you can still use multiple cores. The ideal shard count per node therefore depends on the use case, so I would recommend benchmarking it to see what is right for your use case.

Bertrand · January 16, 2017, 4:10pm

Ok. Thanks.

As you said, whether or not it is a good idea depends on the use case.
The metrics use case is all about aggregations and dashboards showing multiple facets of the system. They are likely to send many queries at the same time hence achieving some concurrency on their own.

On the other hand, the logs use case is less concurrent as it is often made of a single query per user at a time. In this case concurrency is more a matter of how many concurrent users we have. Increasing the number of shards so each node is allocated more than one may bring some benefits.

Christian_Dahlqvist · January 16, 2017, 4:15pm

For metrics and logging use cases you generally end up using time-based indices, which means that the shard count is almost always considerably larger than the number of cores on a server. As each shard comes with some overhead in terms of heap usage and file handles, having too many shards can also cause problems and be inefficient.

Bertrand · January 16, 2017, 4:35pm

Indeed. We currently have one index per day, each configured with 3 shards and 1 replica.
The metrics indexes have about 240m documents for a total size (including replica) of 45Gb).
The logs indexes have about 90m documents for a total size (including replica) of 25Gb).
The ES nodes have 32Gb RAM of which 15Gb is dedicated to the ES heap.

Things are usually ok for metrics since they often address a relatively small timeframe from one hour to a couple. As we said earlier each query is likely to be executed by a single thread on each node. However there are many requests in // hence achieving descent concurrency and overall response time.

However, logs analytics is sometimes a bit more heavy - mainly because of users building crazy queries. If the time period is less than a day, we could benefit from having more than one shard per node.

Or is it because my shards are becoming too large?

Christian_Dahlqvist · January 16, 2017, 4:41pm

Shard size does affect query speed, which is why it is generally recommended to benchmark in order to find the ideal shard size as described in this video.

system · February 13, 2017, 4:42pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shards per CPU Elasticsearch	5	4115	July 5, 2017
Parallelism Per Request Elasticsearch	2	676	August 30, 2018
With multiple shards on the node, the queries for those shards have to be run serially？ Elasticsearch	4	446	July 5, 2017
Shard size vs. query performance where all shard have the same terms Elasticsearch	6	609	August 31, 2019
Optimizing single-node search performance Elasticsearch	4	878	December 16, 2022

One query thread per Shard?

Related topics