Too big a shard vs Too many shards

Hi Bernt,

Thanks for your comment.

Interesting though, but from what I understand the number of indexes does not physically affect anything if the total number of shards stay the same and what it really boils down to is the shards layout or in other words how many shards are being queried and across how many nodes simultaneously. In other words, to me, indexes is more of a "management thing" rather than a "performance thing".

Quoting the definitive guide:

Searching 1 index of 50 shards is exactly equivalent to searching 50 indices with 1 shard each: both search requests hit 50 shards.

So for your 4 node setup, let's assume as an example your search query results in requests hitting 5 shards on each node, the performance would be the same whether those 5 shards are on 1 index or on 5 indices.

Unless I'm misunderstanding something.

From what I gathered from different advises on the web, in an Utopian cluster, 1 node will have 1 shard (either primary or replica) holding no more than 50GB of data (for ease of moving shards around) and then as the total amount of data goes up we just increase the number of nodes in the cluster (in reality only up to a certain limit due to full mesh network constraints - I was advised the number of nodes per cluster should not exceed 200 nodes).

In reality we can't practically have the above so each user has to work out the best compromise for them.

In my case, my strategy is I'm trying to find out total number of shards first based on some set out constraints: for eg, each shard should hold no more than 50GB data (or 1 shard per node, etc). I know my forecasted total amount of data expected hence I can estimate the total expected number of shards in the cluster. Then I know my data is time based so I will have daily indices and I plan to keep for eg 30 days retention so I will have 30 indices in the whole cluster. Then I can calculate the number of shards per index to be: total number of shards in the whole cluster divided by 30 (I will then use this value as a guide to set the number of primaries + replicas in the index template).

Cheers,