Shards balance and homogeneity of their sizes

I understood the need of a well balanced ES cluster, with indices made of enough shards (not too much but not too few), but I was wondering if the constancy of the size of shards was important ?
Indeed, according the purpose of the index, I got yearly, monthly, daily ones leading to very different sizes of shard : is this a problem at the end ?

Hello Sebastien,

We already had a problem with this so I can safely say that - yes, this can be a problem! Elasticsearch balances the shards on all nodes based on the shard count which makes sense as the best practices say 20 shards per 1GB of RAM.

We have a cluster 3 equally large nodes(cpu,RAM, storage) but our shard sizes were very different: some were 25GB per shard and some were only 1GB per shard(I know...)

In the end, one node created an alert because the storage was nearly full(>80%) while the other 2 nodes had still >40% free.

The best way to solve this is to make all shards more or less the same size. Otherwise - and that was our temporary workaround then - you can take a look at the index shard allocation. We used the allocation settings to pin the different large indices to specific nodes within the cluster to better distribute the load.

Best regards
Wolfram

1 Like

The problem here seems to be that you set up an overly sensitive alert. Elasticsearch doesn't mind having imbalanced storage like this, really you should only alert if you're approaching the low watermark (85% by default) on all nodes or if one node is persistently over the high watermark (90% by default). Outside of those conditions, no action is really needed so an alert is kind of inappropriate. Moving shards around is expensive (it blows out the filesystem cache for instance) so it's usually preferable to leave things alone.

See these docs for more details, in particular:

NOTE: It is normal for nodes to temporarily exceed the high watermark from time to time.

and

TIP: It is normal for the nodes in your cluster to be using very different amounts of disk space. ..

I am not aware that we changed the alert settings so I guess this is the alert out of the box...

Where did the box in question come from? I don't think Elasticsearch itself ships with any such alert. If it does, that's definitely a bug.

The installation comes directly from Elastic, the alert is even well documented:

This rule checks for Elasticsearch nodes that are nearly at disk capacity. By default, the condition is set at 80% or more averaged over the last 5 minutes. The default rule checks on a schedule time of 1 minute with a re-notify interval of 1 day.

As long as alerts provided by Elastic are firing I will handle them as an urgent problem...

TIL, I did not know this was one of our built-in alerts. I have reported this as a bug:

Thanks @Wolfram_Haussig and @DavidTurner for your share, it helps me.
@DavidTurner, does a well-balanced-shards-cluster can be slowed down (read and write) because shards storage are imbalanced ?

Best regards
Sebastien

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.