Msearch performance decreases when increasing the number of shards

Elasticsearch version: 6.1.1

Plugins installed: [analysis-icu, analysis-phonetic]

Description of the problem:

I am testing the msearch capability on the rally's geonames set with quite a lot of basic match queries (1500) and even though each query is very fast I have noticed that the more shards I set for my index, the longer the whole msearch takes... no matter how many nodes/machines I set up in the cluster.

Thread pool might be a bottleneck on a single machine but I would have expected some performance gain with horizontal scaling.

Steps to reproduce:

Create three geonames indices with the mapping provided in rally's tracks :
https://github.com/elastic/rally-tracks/blob/master/geonames/index.json
Each one with a different number of shards (let's say 5, 10, 20). For example : geonames-5, geonames-10, geonames-20

Populate them with the rally's geonames data set (I had actually let rally create the first index and used the wonderful reindex API to populate the others)

Use the terms list ( https://github.com/elastic/rally-tracks/blob/master/geonames/terms.txt ) to generate the msearch "bodies" with some command like

shuf terms.txt | head -1500 > subterms.txt
awk '{ print "{\"index\": \"geonames-5\"}\n{\"query\": {\"match\": {\"name\": {\"query\": \"" $0 "\" }}}}"}' subterms.txt > msearch.geonames-5
awk '{ print "{\"index\": \"geonames-10\"}\n{\"query\": {\"match\": {\"name\": {\"query\": \"" $0 "\" }}}}"}' subterms.txt > msearch.geonames-10
awk '{ print "{\"index\": \"geonames-20\"}\n{\"query\": {\"match\": {\"name\": {\"query\": \"" $0 "\" }}}}"}' subterms.txt > msearch.geonames-20

run the msearches on each index.

Thanks for your help!

I am wondering whether it could be seen as an issue or not. Any advice?

Hi,

Elasticsearch has several limits in place that could explain this behavior. First of all, the number of concurrent searches is limited based on the number of nodes and the size of the search thread pool (see also max_concurrent_searches in the msearch docs). After a node has received an msearch requests, it sends each query individually to a node in the cluster (and respecting max_concurrent_searches). That node will coordinate the search request and will query individual shards. In order to avoid overwhelming the cluster there is another limit per query in place which allows at most 256 or number of nodes * default number of shards (5) concurrent shard requests whichever of those two numbers is smaller (see source code). To be clear: the default number of shards is the out-of-the-box default, not the configured default for that index. In the case of a three node cluster it would query 3 (nodes) * 5 (shards) = 15 shards at most per search request.

As the geonames index is rather small (around 6GB), I guess the search time is dominated by:

  • Wait time spent in the search queue of the individual nodes
  • Wait time for results to arrive at the coordinating node

and finally also the limit of maximum number of concurrent shard requests.

I think it could make sense to retry this experiment with a much larger index.

Daniel

Is there any config to limit MaxConcurrentShardRequests? because in our case 256 is still to much.
Also we use ES 2.4 and It seems that this limit was added in ES 6.x. So there was no limit before that, yeah?

Hi,

See the docs how to limit it. This has been introduced in 5.6.0 for the search API with Limit the number of concurrent shard requests per search request by s1monw · Pull Request #25632 · elastic/elasticsearch · GitHub and Expose `max_concurrent_shard_requests` in `_msearch` by s1monw · Pull Request #33016 · elastic/elasticsearch · GitHub will bring the same parameter for msearch.

Daniel

Hi Daniel and thanks a lot for the answer and details! It fits all my tests : I had first tried to play with the max_concurrent_searches setting (this defaultMaxConcurrentSearches looked pretty interesting) but without any success... which kind of determined the coordinating node as the bottleneck. Then splitting the msearch and sending each part to different client nodes helped.

Oh btw, is there any better and finer way to follow a threadpool than watch -n 0.1 ?

Hi Vincent,

I suppose you mean by that, that you call the node stats API periodically? That's the usual approach to monitor thread pool statistics from the command line. You can limit node stats to thread pool statistics e.g. by issuing curl -X GET "es_host:9200/_nodes/stats/thread_pool?pretty" and use a tool like jq to only show the fields you're interested in. Alternatively, you can use X-Pack monitoring to view thread pool statistics over time.

Daniel

Yes, I'm calling the thread_pool stats periodically. I am going to give another look at the X-pack monitoring, then.

Thank you again!

Vincent

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.