Elastic sharding strategy

I have a question about sharding strategy and performance please:

when creating an index, the default number of shards is 1 and 1 replica.
when creating an ILM policy, the default rollover is 30days/ 50gb.

So when working with the default, I'll have indices of 100GB (primary + replica).

so far so good.

BUT: I saw in an article (sorry I lost the link), that smaller shards(not too many) will gain better performance.

my questions are: (sorry for sending them at once)

  1. isn't a 50 GB shard too much? (I know it's the recommended maximum limit, but why is it the default?)
  2. if I split it into 2 indices of 25GB shards, should queries work faster (in the avg case..)
  3. if I already have an alias to indices (for example logs-alias: logs-00001,logs-00002 etc), can I use the split api to split all of them? how should I do it technically please, so the alias will still server queries?
  4. will adding few replicas give the same performance improvement (query no insert) as splitting the indices to more primary shards?

Thank you

This depends on the use case and also possibly on how much data you ingest and how long you keep it. In my experience it is much more common to have performance due to too many small shards than from toom large shards.

Each shard is queried in a single thread so the size of the shard affects the minimum latency that can be achieved. Querying 2 shards of 25GB is likely faster than querying one 50GB shard as two threads can be used. It is also possible that querying 20 25GB shards is faster than querying 10 50GB shards. At some point you will however reach a point where querying a larger number of shards start showing worse performance. Exactly when this occurs depends on a lot of factors, so you need to test what is right for your use case.

Adding replicas is generally done to add resiliency or increase the number of queries per second the cluster can handle. I would not expect it to affect latency much.

thanks so much @Christian_Dahlqvist.

Last thing - can I split at once all indices referenced by an alias?
will I need to point the alias after the split to the result?

I think you need to split them individually.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.