Elastic sharding strategy

alexs1 · May 13, 2021, 3:12pm

Hello,
I have a question about sharding strategy and performance please:

when creating an index, the default number of shards is 1 and 1 replica.
when creating an ILM policy, the default rollover is 30days/ 50gb.

So when working with the default, I'll have indices of 100GB (primary + replica).

so far so good.

BUT: I saw in an article (sorry I lost the link), that smaller shards(not too many) will gain better performance.

my questions are: (sorry for sending them at once)

isn't a 50 GB shard too much? (I know it's the recommended maximum limit, but why is it the default?)
if I split it into 2 indices of 25GB shards, should queries work faster (in the avg case..)
if I already have an alias to indices (for example logs-alias: logs-00001,logs-00002 etc), can I use the split api to split all of them? how should I do it technically please, so the alias will still server queries?
will adding few replicas give the same performance improvement (query no insert) as splitting the indices to more primary shards?

Thank you

Christian_Dahlqvist · May 13, 2021, 3:57pm

This depends on the use case and also possibly on how much data you ingest and how long you keep it. In my experience it is much more common to have performance due to too many small shards than from toom large shards.

Each shard is queried in a single thread so the size of the shard affects the minimum latency that can be achieved. Querying 2 shards of 25GB is likely faster than querying one 50GB shard as two threads can be used. It is also possible that querying 20 25GB shards is faster than querying 10 50GB shards. At some point you will however reach a point where querying a larger number of shards start showing worse performance. Exactly when this occurs depends on a lot of factors, so you need to test what is right for your use case.

Adding replicas is generally done to add resiliency or increase the number of queries per second the cluster can handle. I would not expect it to affect latency much.

alexs1 · May 13, 2021, 4:05pm

thanks so much @Christian_Dahlqvist.

Last thing - can I split at once all indices referenced by an alias?
will I need to point the alias after the split to the result?

Christian_Dahlqvist · May 13, 2021, 5:28pm

I think you need to split them individually.

system · June 10, 2021, 5:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shard Configuration Elasticsearch ilm-index-lifecycle-management	2	219	August 19, 2022
Proper shard and replica settings Elasticsearch	6	111	August 5, 2024
If you have capacity, is 1 index with 5 shards better than 5 indices with 1 shard each? Elasticsearch	7	133	May 1, 2024
Elasticsearch index policy creation best practice/performance Elasticsearch ilm-index-lifecycle-management	2	2549	March 21, 2020
Correct number of shards for 5.3 TB indices Elasticsearch	10	2152	May 18, 2017

Elastic sharding strategy

Related topics