Total_shards_per_node and disk usage too high causes shards to stay unallocated

Hi,

I noticed that some of my indices are staying in a yellow state with missing replicas shards. When checking the allocation explain, I can see it's because:

  • I have set the total_shards_per_node to one to ensure those indices are evenly distributed because they are write heavy, so there are as much shards+replicas as there are nodes
  • The remaining one that should be allocated have reach the disk space threshold so it cannot be allocated to it, and it cannot be allocated anywhere else as every other node already have a shard for this index

I need to keep one shard per node and as I understand it, the disk space is not an important criteria for shard distribution (Elasticsearch mostly use the load I think?).
The issue here is that Elasticsearch does not seem to have an issue with one of the node having reached the max threshold. If it did, it could have moved some of the indices that can be moved (there are some) in order to regain space and be able to allocate the missing shards.
Instead, it just stays in this state with indices that will never be allocated unless I make some room for it by relocating some shards manually.

You can see in the attached image that the disk usage is bad on 2-3 nodes but for the other it's fine, and they theoretically can all contain the same things (same roles on each nodes)

Is there anyway to make Elasticsearch do this kind of thing automatically?

A thing I can think of is to make the watermark low equals the watermak high, this way if ES stops allocating it will also relocate to be able to allocate? Am I right and if so, is there any downside for this?

Thanks!

you can setup shard distribution by disk

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html#disk-based-shard-allocation

Thanks for your answer.
I know about that but from what I understand, Elasticsearch prioritize the load criteria over the disk one. That's why, I guess, the cluster is leaving the node above the low watermark even if that's preventing the allocation of other shards that can only be allocated to the node above the watermark, because moving things would cause the load criteria to not be met.

My actual configuration is the default, to 85% low and 90% high watermark.
My suggested solution for which I want advice to know if it has any drawbacks is based on the fact that:

  • Low watermark controls if Elasticsearch can allocate or not to the node
  • High watermark controls if Elasticsearch will try to relocate to increase available space

So my guess is, by making the low and high watermark to the same value, the cluster will never (or at least very temporarily) stay in the state it is right now, if there is no more room to allocate any shards, then it will instantly try to make some room.

this my cluster's disk looks like.

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1       931G  439G  492G  48% /s1

/dev/sdb1       931G  456G  475G  49% /s1

/dev/sdb1       931G  519G  413G  56% /s1


Hum ok but how is that relevant to my question? :smile:

I have never try that. as I generally try to keep good amount of space free. Once I had all disk above low watermark and it was constant battle to move around stuff. everything gets slower due to that.