I have multiple data nodes in my cluster with uneven disk capacities. While the data is being continuously being indexed, some nodes reach the 90% watermark (other nodes in the cluster do have sufficient remaining disk space), causing the cluster to go into a blocked state. That is, the index.blocks.read_only_allow_delete is set to true for all the indices and hence no more data gets indexed.
How to avoid this and maximise the disk usage.
This is by design. If a node hits cluster.routing.allocation.disk.watermark.flood_stage (95% by default not 90%) then Elasticsearch must stop writing to all the indices on that node. It cannot carry on writing to some of the shard copies on other nodes and just avoid the ones on the full node, because all shard copies must contain the same data.
You can avoid this by giving Elasticsearch more space so it does not need to protect itself from a full disk by blocking indexing. You can also postpone the problem by increasing the flood_stage watermark.
Thank you @DavidTurner, but my doubt is why was the shards not moved away to a node that has free disk when it reached the cluster.routing.allocation.disk.watermark.high
Normally that would be the case, but if disk usage grows too quickly then there might not be time to relocate enough shards away before hitting the flood_stage watermark.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.