Shards balance and homogeneity of their sizes

sebastienf · August 17, 2021, 10:09am

I understood the need of a well balanced ES cluster, with indices made of enough shards (not too much but not too few), but I was wondering if the constancy of the size of shards was important ?
Indeed, according the purpose of the index, I got yearly, monthly, daily ones leading to very different sizes of shard : is this a problem at the end ?

Wolfram_Haussig · August 19, 2021, 7:40am

Hello Sebastien,

We already had a problem with this so I can safely say that - yes, this can be a problem! Elasticsearch balances the shards on all nodes based on the shard count which makes sense as the best practices say 20 shards per 1GB of RAM.

We have a cluster 3 equally large nodes(cpu,RAM, storage) but our shard sizes were very different: some were 25GB per shard and some were only 1GB per shard(I know...)

In the end, one node created an alert because the storage was nearly full(>80%) while the other 2 nodes had still >40% free.

The best way to solve this is to make all shards more or less the same size. Otherwise - and that was our temporary workaround then - you can take a look at the index shard allocation. We used the allocation settings to pin the different large indices to specific nodes within the cluster to better distribute the load.

Best regards
Wolfram

DavidTurner · August 19, 2021, 8:06am

The problem here seems to be that you set up an overly sensitive alert. Elasticsearch doesn't mind having imbalanced storage like this, really you should only alert if you're approaching the low watermark (85% by default) on all nodes or if one node is persistently over the high watermark (90% by default). Outside of those conditions, no action is really needed so an alert is kind of inappropriate. Moving shards around is expensive (it blows out the filesystem cache for instance) so it's usually preferable to leave things alone.

See these docs for more details, in particular:

NOTE: It is normal for nodes to temporarily exceed the high watermark from time to time.

and

TIP: It is normal for the nodes in your cluster to be using very different amounts of disk space. ..

Wolfram_Haussig · August 19, 2021, 8:13am

I am not aware that we changed the alert settings so I guess this is the alert out of the box...

DavidTurner · August 19, 2021, 9:52am

Where did the box in question come from? I don't think Elasticsearch itself ships with any such alert. If it does, that's definitely a bug.

Wolfram_Haussig · August 19, 2021, 10:08am

The installation comes directly from Elastic, the alert is even well documented:

This rule checks for Elasticsearch nodes that are nearly at disk capacity. By default, the condition is set at 80% or more averaged over the last 5 minutes. The default rule checks on a schedule time of 1 minute with a re-notify interval of 1 day.

As long as alerts provided by Elastic are firing I will handle them as an urgent problem...

DavidTurner · August 19, 2021, 11:06am

TIL, I did not know this was one of our built-in alerts. I have reported this as a bug:

github.com/elastic/kibana

Disk usage threshold alert fires well before action is needed

opened 11:05AM - 19 Aug 21 UTC

DaveCTurner

bug Team:Alerting Services

**Kibana version:** 7.14.0 (likely others) **Elasticsearch version:** 7.14.0 …(likely others) **Describe the bug:** The default [disk usage threshold alert](https://www.elastic.co/guide/en/kibana/current/kibana-alerts.html#kibana-alerts-disk-usage-threshold) will fire when a single node reaches 80% capacity even though no action is needed at this point. Elasticsearch itself doesn't react at all until disk usage reaches the low watermark (85% capacity by default) and only really starts putting any effort in when it reaches the high watermark (90% capacity by default); intervention from a user is only needed once Elasticsearch runs out of options for moving shards around. See [these Elasticsearch docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html#disk-based-shard-allocation) for more info, noting in particular: > **NOTE**: It is normal for nodes to temporarily exceed the high watermark from time to time. **Steps to reproduce:** 1. Install Elasticsearch & Kibana with default settings 2. Fill one of the disks up to over 80% capacity 3. Note that the disk usage threshold alert fires even though Elasticsearch logs no warnings about disk usage. **Expected behavior:** The alert should only fire when action is needed from the user, which in this context means a node is persistently over its high watermark, or the cluster in total is approaching its low watermark capacity. We shouldn't be firing an alert for a situation that Elasticsearch considers to be normal. **Any additional context:** Raised in https://discuss.elastic.co/t/shards-balance-and-homogeneity-of-their-sizes/281661

sebastienf · August 20, 2021, 10:29am

Thanks @Wolfram_Haussig and @DavidTurner for your share, it helps me.
@DavidTurner, does a well-balanced-shards-cluster can be slowed down (read and write) because shards storage are imbalanced ?

Best regards
Sebastien

system · September 17, 2021, 10:29am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Disk space per node in for ES cluster is not balanced across the nodes Elasticsearch	4	5224	December 3, 2018
Shard allocation based on shard size Elasticsearch	14	938	January 18, 2021
Shard Balancing Elasticsearch	12	7369	July 6, 2017
Shards not allocating based on disk space Elasticsearch	6	892	May 14, 2019
All shards being allocated on the same node Elasticsearch	7	3993	July 5, 2017

Shards balance and homogeneity of their sizes

Related topics