We have a scenario where we have data from multiple clients.
some figures to note:
No of clients - 100
Data/client - 10GB (avg)

With the above data, what should be our Indexing strategy?

We know that we will need to increase the shard count if we go with 1 index solution as data grows, which is not a straightforward process and needs re-indexing. is this reason enough to ignore the single index strategy?

But as we mentioned data per client is going to be ~10GB only, which is a bit low from the recommended shard size(30GB), so can we still go with a multiple index(index/client) strategy?

Prons/ Cons we are aware of:

  1. Shard level scaling would be a challenge on a single index(not sure how much) whereas we will have pre-calculated shards for each index in a multi-indexing strategy.
  2. Searching will be impacted as distribution will happen across all the primary shards and huge data in a single index whereas Searching will be optimized in multiple indexing.


Going with 1 index is a bad idea, you average data is 1TB
Go with 1 index par client, you should be at 100 indexes
Add 1 replicas for each index for HA, you finish by having 200 shards for 100 indexes (Average of 40GB per index)

If you use ILM with this, you'll be able to manage things a little easier.

