Total_shards_per_node and disk usage too high causes shards to stay unallocated

voharunado · December 18, 2024, 1:25pm

Hi,

I noticed that some of my indices are staying in a yellow state with missing replicas shards. When checking the allocation explain, I can see it's because:

I have set the total_shards_per_node to one to ensure those indices are evenly distributed because they are write heavy, so there are as much shards+replicas as there are nodes
The remaining one that should be allocated have reach the disk space threshold so it cannot be allocated to it, and it cannot be allocated anywhere else as every other node already have a shard for this index

I need to keep one shard per node and as I understand it, the disk space is not an important criteria for shard distribution (Elasticsearch mostly use the load I think?).
The issue here is that Elasticsearch does not seem to have an issue with one of the node having reached the max threshold. If it did, it could have moved some of the indices that can be moved (there are some) in order to regain space and be able to allocate the missing shards.
Instead, it just stays in this state with indices that will never be allocated unless I make some room for it by relocating some shards manually.

You can see in the attached image that the disk usage is bad on 2-3 nodes but for the other it's fine, and they theoretically can all contain the same things (same roles on each nodes)

Is there anyway to make Elasticsearch do this kind of thing automatically?

A thing I can think of is to make the watermark low equals the watermak high, this way if ES stops allocating it will also relocate to be able to allocate? Am I right and if so, is there any downside for this?

Thanks!

elasticforme · December 18, 2024, 1:55pm

you can setup shard distribution by disk

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html#disk-based-shard-allocation

voharunado · December 18, 2024, 2:43pm

Thanks for your answer.
I know about that but from what I understand, Elasticsearch prioritize the load criteria over the disk one. That's why, I guess, the cluster is leaving the node above the low watermark even if that's preventing the allocation of other shards that can only be allocated to the node above the watermark, because moving things would cause the load criteria to not be met.

My actual configuration is the default, to 85% low and 90% high watermark.
My suggested solution for which I want advice to know if it has any drawbacks is based on the fact that:

Low watermark controls if Elasticsearch can allocate or not to the node
High watermark controls if Elasticsearch will try to relocate to increase available space

So my guess is, by making the low and high watermark to the same value, the cluster will never (or at least very temporarily) stay in the state it is right now, if there is no more room to allocate any shards, then it will instantly try to make some room.

elasticforme · December 18, 2024, 2:46pm

this my cluster's disk looks like.

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1       931G  439G  492G  48% /s1

/dev/sdb1       931G  456G  475G  49% /s1

/dev/sdb1       931G  519G  413G  56% /s1

voharunado · December 18, 2024, 3:21pm

Hum ok but how is that relevant to my question?

elasticforme · December 18, 2024, 7:05pm

I have never try that. as I generally try to keep good amount of space free. Once I had all disk above low watermark and it was constant battle to move around stuff. everything gets slower due to that.

Topic		Replies	Views
Shards not allocating based on disk space Elasticsearch	6	892	May 14, 2019
ES not moving shards off node with high disk usage. (88% ) Elasticsearch	9	2432	May 9, 2019
How can I modify the distribution of the shards in nodes with diffent disk capacity? Elasticsearch	5	416	August 2, 2019
Watermark and shard allocation Elasticsearch	2	701	July 17, 2017
All shards being allocated on the same node Elasticsearch	7	3993	July 5, 2017

Total_shards_per_node and disk usage too high causes shards to stay unallocated

Related topics