How to Overallocate Intelligently

I was reading the definitive guides articles on shards as a unit of scale, and shard allocation:

https://www.elastic.co/guide/en/elasticsearch/guide/current/shard-scale.html

It explains why to shard at all with the example of a 1-shard index on a single node. Suddenly, you need to increase capacity so you grow your cluster to two nodes. However, since your index only has a single shard, it has nothing to put on the new node, and so you gain no increase in performance. This implies that the exact benefit of over allocation is increased performance in the FUTURE, when you are scaling up your cluster size.

Two questions:

  1. Is my analysis of their example correct?? lol
  2. Secondly, does this imply that once you DO reach the point where # of nodes = # of shards, that you will begin to gain lower and lower performance benefits from adding new nodes??

Obviously, you will still gain whatever additional resources you add, but you will be at the maximum efficiency for resource allocation on a single index, because your shards will all be on different machines at that point.

Just making sure I have a good understanding here. Thanks!!!

I think the answer you'll always hear to this question is that "it depends" however your understanding is more or less correct of that simple scenario. Once you have one shard per node (with replicas spread out as well) then adding more nodes isn't going to help at all for that particular index.

Unfortunately, selecting shard numbers is a really tricky thing to get right and depends on a lot of factors that aren't always clear at the point when you have to make the decision.

Kimbro