However, my experience was 5 shards index returned query results 2 times faster than 1 shard. Everything was the same (same computer, same ES version, same data, same test query), only thing different was the number of shards. Any reason for that?
Some stats of my environments:
Setup 1 - 16GB ram, 1 node, 5 primary shards, 4GB heap (-Xms4g -Xmx4g)
Total main record 2.5m, total documents 3.8m (including nested), total index size 1GB test_index 0 p 771822 209.1mb test_index 1 p 770783 206.7mb test_index 2 p 773046 205.7mb test_index 3 p 770961 209.9mb test_index 4 p 771440 205.7mb
Test query with 25 nested queries wrapped in dis_max took 13s
Setup 2 - 16GB ram, 1 node, 1 primary shard, 4GB heap (-Xms4g -Xmx4g)
Total main record 2.5m, total documents 3.8m (including nested), total index size 1GB test_index 0 p 3858052 1000.8mb
A query with 25 nested queries wrapped in dis_max took 26s
It sounds like you have a quite complex query run against an index that fully fits within the operating system page cache. No disk I/O should be required here. You also have a low query concurrency as you run just a f SF ingle query. I would expect this to make your scenario largely CPU bound.
As each query basically is executed in a single thread against each shard (different shards are processed in parallel) your query against a single shard can only use one core, which is likely to make it slower in this scenario. Note however that querying 5 shards is not 5 times faster. If your query concurrency was higher or you had more data this could very well change though.
Query latency and optimal shard size and count will depend on a large number of factors, so it is important to always test with a scenario as realistic as possible before you draw any conclusions.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.