Let's say the value of cluster.routing.allocation.disk.watermark.low
and cluster.routing.allocation.disk.watermark.high
are 85% and 90% respectively. Let's say a data node has 87% disk filled. Now, shard allocation will not be allowed on this node because it has exceeded the low watermark. If this node is restarted for any reason, all shards will remain unassigned due to the same reason. Elasticsearch will attempt to allocate some of the shards to other nodes and only when the disk usage is less than 85%, the remaining shards will initialize on this node. In a worse case, the time it takes to free up the 2% disk space is more than index.unassigned.node_left.delayed_timeout
, shards will start allocating to other nodes thereby increasing recovery time. But, if we set the value of both the watermarks to 90% (effectively meaning there is no low disk watermark configuration), the situation would be as good as, if not better than having the low disk watermark set to 85%. Even after a data node restarts, all the shards would get initialized on the same node. Am I missing something here?
This is broadly true, but does rely on the node in question being restarted to present this problem. Setting the low watermark higher really only defers the problem: there's not much space between 85% and 90%, and the same thing happens if you restart a node over the high watermark.
In the common case that the node is not restarted while above the watermark then there is value in stopping the allocation of new shards without starting to actively move shards off the node as would happen if both watermarks were set to 90%
: the high disk usage is expected to be transient (e.g. due to ongoing merges or perhaps an index is about to be deleted) so temporarily relieving pressure on that node is the right thing to do.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.