Any reason why having 1 shard is slower than 5 shards?

NewHere · March 12, 2020, 11:56pm

According to various online sources (e.g. https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-search-speed.html), they all said fewer shard the faster.

However, my experience was 5 shards index returned query results 2 times faster than 1 shard. Everything was the same (same computer, same ES version, same data, same test query), only thing different was the number of shards. Any reason for that?

Some stats of my environments:
Setup 1 - 16GB ram, 1 node, 5 primary shards, 4GB heap (-Xms4g -Xmx4g)
Total main record 2.5m, total documents 3.8m (including nested), total index size 1GB
test_index 0 p 771822 209.1mb
test_index 1 p 770783 206.7mb
test_index 2 p 773046 205.7mb
test_index 3 p 770961 209.9mb
test_index 4 p 771440 205.7mb
Test query with 25 nested queries wrapped in dis_max took 13s

Setup 2 - 16GB ram, 1 node, 1 primary shard, 4GB heap (-Xms4g -Xmx4g)
Total main record 2.5m, total documents 3.8m (including nested), total index size 1GB
test_index 0 p 3858052 1000.8mb
A query with 25 nested queries wrapped in dis_max took 26s

Christian_Dahlqvist · March 13, 2020, 2:11am

It sounds like you have a quite complex query run against an index that fully fits within the operating system page cache. No disk I/O should be required here. You also have a low query concurrency as you run just a f SF ingle query. I would expect this to make your scenario largely CPU bound.

As each query basically is executed in a single thread against each shard (different shards are processed in parallel) your query against a single shard can only use one core, which is likely to make it slower in this scenario. Note however that querying 5 shards is not 5 times faster. If your query concurrency was higher or you had more data this could very well change though.

Query latency and optimal shard size and count will depend on a large number of factors, so it is important to always test with a scenario as realistic as possible before you draw any conclusions.

NewHere · March 13, 2020, 3:31am

Thank you for your response. You were right that CPU was maxed out while processing the test query.

So in my case of complex queries I actually need to increase the shard count rather than to reduce it?

Christian_Dahlqvist · March 13, 2020, 6:21am

If this is the full data set you will have in production and you expect very low query concurrency that would appear to be the case.

system · April 10, 2020, 6:21am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ElasticSearch number of shards queried Elasticsearch	14	997	May 14, 2019
Shards per CPU Elasticsearch	5	4233	July 5, 2017
Performance 1 shard / 1 node vs 5 shards / 5 nodes Elasticsearch	5	402	July 6, 2017
Optimal shards: 1 or number of nodes? Considerations Elasticsearch	10	5762	August 29, 2018
Idle way of creating indeces Elasticsearch	2	323	July 6, 2017

Any reason why having 1 shard is slower than 5 shards?

Related topics