Shards per CPU

Should my shard allocation strategy take available CPUs into considerations?

When using a mutli processor machine, I saw a serious slow down in indexing and query speeds when using 1 shard, while getting much better speeds when allocation 4 shards (machine has 8 CPUs).

If I have 1 shard, does that mean that I can only use one bulk/search thread for that index? Does using multiple shards have an impact on disk I/O? (I was under the impression that writing to 1 file will produce the same results as writing to 4 files in terms of disk I/O, am I wrong?)


by default a single shard already allows for concurrency and there shouldnt be an issue running a single shard on a single machine. Maybe you can elaborate on your test scenario? Any special configuration? Special machine setup?



I have a 20 CPU machine running ES in a docker container, a single node with a single index. I also have a small application that writes logs to a Kafka topic and a logstash instance that is subscribed to that topic and ships the logs into the index.

I then fill the topic with messages as fast as I can and let Logstash do its job. When the index is set to 1 shard, it took me about 140 seconds to index 500,000 messages and a complex query on that index took about 140ms, when I set the index to 5 shards, indexing took about 120 seconds and the same query took about 50ms, running with 10 shards took about 100 seconds and query speed was about 30ms. This is a testing ground, so no other indices are on that node and in each test I start by deleting the previous index and starting all over. I'm also restarting the logstash and Kafka nodes and before each test run I warm up the logstash instance by inserting about 100,000 records, then deleting the index and only then starting the test.

Forgot to mention that I'm running ES 2.4.0 without any special configuration other than mlockall.

Any ideas @spinscale?

I did some benchmarking on Elastic Cloud for Elastic{ON}, and saw an increase in indexing performance when going from 1 to 2 shards. After that the indexing throughput gain trailed off and eventually actually got worse.

As queries are performed in parallel across shards using a single thread per query and shard, the minimum query time will depend on the shard size. If you are running queries through a single thread without any indexing, I would expect better latencies against a larger number of smaller shards (up to the number of CPU cores) as more work can be done in parallel. In a real-life scenario where you need to serve multiple queries concurrently, this may however not be the most efficient setup. When you start testing concurrent indexing and querying this type of theoretical analysis get a lot trickier as indexing and querying compete for the same resources and affect each other.

If possible, benchmark with a realistic mix of concurrent indexing and query load. This will give you the best idea of how your node/cluster performs.

1 Like