Shards per CPU

Sure,

I have a 20 CPU machine running ES in a docker container, a single node with a single index. I also have a small application that writes logs to a Kafka topic and a logstash instance that is subscribed to that topic and ships the logs into the index.

I then fill the topic with messages as fast as I can and let Logstash do its job. When the index is set to 1 shard, it took me about 140 seconds to index 500,000 messages and a complex query on that index took about 140ms, when I set the index to 5 shards, indexing took about 120 seconds and the same query took about 50ms, running with 10 shards took about 100 seconds and query speed was about 30ms. This is a testing ground, so no other indices are on that node and in each test I start by deleting the previous index and starting all over. I'm also restarting the logstash and Kafka nodes and before each test run I warm up the logstash instance by inserting about 100,000 records, then deleting the index and only then starting the test.

Forgot to mention that I'm running ES 2.4.0 without any special configuration other than mlockall.