I noticed that some of my indices are staying in a yellow state with missing replicas shards. When checking the allocation explain, I can see it's because:
I have set the total_shards_per_node to one to ensure those indices are evenly distributed because they are write heavy, so there are as much shards+replicas as there are nodes
The remaining one that should be allocated have reach the disk space threshold so it cannot be allocated to it, and it cannot be allocated anywhere else as every other node already have a shard for this index
I need to keep one shard per node and as I understand it, the disk space is not an important criteria for shard distribution (Elasticsearch mostly use the load I think?).
The issue here is that Elasticsearch does not seem to have an issue with one of the node having reached the max threshold. If it did, it could have moved some of the indices that can be moved (there are some) in order to regain space and be able to allocate the missing shards.
Instead, it just stays in this state with indices that will never be allocated unless I make some room for it by relocating some shards manually.
You can see in the attached image that the disk usage is bad on 2-3 nodes but for the other it's fine, and they theoretically can all contain the same things (same roles on each nodes)
Is there anyway to make Elasticsearch do this kind of thing automatically?
A thing I can think of is to make the watermark low equals the watermak high, this way if ES stops allocating it will also relocate to be able to allocate? Am I right and if so, is there any downside for this?
Thanks for your answer.
I know about that but from what I understand, Elasticsearch prioritize the load criteria over the disk one. That's why, I guess, the cluster is leaving the node above the low watermark even if that's preventing the allocation of other shards that can only be allocated to the node above the watermark, because moving things would cause the load criteria to not be met.
My actual configuration is the default, to 85% low and 90% high watermark.
My suggested solution for which I want advice to know if it has any drawbacks is based on the fact that:
Low watermark controls if Elasticsearch can allocate or not to the node
High watermark controls if Elasticsearch will try to relocate to increase available space
So my guess is, by making the low and high watermark to the same value, the cluster will never (or at least very temporarily) stay in the state it is right now, if there is no more room to allocate any shards, then it will instantly try to make some room.
I have never try that. as I generally try to keep good amount of space free. Once I had all disk above low watermark and it was constant battle to move around stuff. everything gets slower due to that.
I see thanks, in my case the thing is I have a lot of disk space still available on other nodes (see my capture in my first message), but because I have a max shards per node rule for a lot of my indices, the disk space is not distributed equally.
Anyway, I tried what I suggested, I have set the high and low watermarks to the same value (85%) and until now the cluster looks stable and always relocate to avoid a node staying above 85%. I am still afraid of any drawbacks but for now it looks ok.
One downside is it seems you are effectively deliberately forcing your cluster to keep re-shuffling data around. And, from the free disk space shared in first post, you have a pretty unbalanced cluster in terms of disk space used / /mode. so if all your docs are roughly equal size, then there is also imbalance in docs/node.
But thats only important if it's important in your circumstances - if your cluster is performing well for your use case, then such an imbalance is fine IMO, if a a bit inelegant
We've no idea of your needs, but if e.g. you have a critical "today" index, that has one shard per node across say 3 nodes, with 3 replica shards, for search purposes you have reduced your effective cluster size to 6, and "which 6" might vary day to day and even hour to hour as it shuffles.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.