Any reason why having 1 shard is slower than 5 shards?

According to various online sources (e.g. https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-search-speed.html), they all said fewer shard the faster.

However, my experience was 5 shards index returned query results 2 times faster than 1 shard. Everything was the same (same computer, same ES version, same data, same test query), only thing different was the number of shards. Any reason for that?

Some stats of my environments:
Setup 1 - 16GB ram, 1 node, 5 primary shards, 4GB heap (-Xms4g -Xmx4g)
Total main record 2.5m, total documents 3.8m (including nested), total index size 1GB
test_index 0 p 771822 209.1mb
test_index 1 p 770783 206.7mb
test_index 2 p 773046 205.7mb
test_index 3 p 770961 209.9mb
test_index 4 p 771440 205.7mb
Test query with 25 nested queries wrapped in dis_max took 13s

Setup 2 - 16GB ram, 1 node, 1 primary shard, 4GB heap (-Xms4g -Xmx4g)
Total main record 2.5m, total documents 3.8m (including nested), total index size 1GB
test_index 0 p 3858052 1000.8mb
A query with 25 nested queries wrapped in dis_max took 26s

It sounds like you have a quite complex query run against an index that fully fits within the operating system page cache. No disk I/O should be required here. You also have a low query concurrency as you run just a f SF ingle query. I would expect this to make your scenario largely CPU bound.

As each query basically is executed in a single thread against each shard (different shards are processed in parallel) your query against a single shard can only use one core, which is likely to make it slower in this scenario. Note however that querying 5 shards is not 5 times faster. If your query concurrency was higher or you had more data this could very well change though.

Query latency and optimal shard size and count will depend on a large number of factors, so it is important to always test with a scenario as realistic as possible before you draw any conclusions.

1 Like

Thank you for your response. You were right that CPU was maxed out while processing the test query.

So in my case of complex queries I actually need to increase the shard count rather than to reduce it?

If this is the full data set you will have in production and you expect very low query concurrency that would appear to be the case.

1 Like