Shards not allocating based on disk space


#1

I have a cluster with 9 nodes, all the same size. The shards are allocating evenly in regards to the number of shards per node, but not in regards to the disk space. In reading the documentation, "Elasticsearch considers the available disk space on a node before deciding whether to allocate new shards to that node or to actively relocate shards away from that node.", but that does not seem to be the case. The nodes range from 31% full to 71% full. I tried using the API to change the rebalancing, but that did not solve anything. Is there a way to allocate the shards based off of disk space (or size of shards) rather than the number of shards? Thank you.


(David Turner) #2

This is from the page on the disk-based shard allocator and the rest of the page describes this logic in much more detail. The goal is not to balance the disk usage, it is to keep the disk usage below the configured watermarks.


#3

Is there a way to balance disk usage?


(David Turner) #4

No, not really. I'm not sure I understand why you would want to do this. It would potentially lead to a lot of unnecessary shard movement as the shards grow over time. Can you explain in a bit more detail what problem you're looking to solve with this feature?


#5

We often run into the issue where one node goes above the watermark which causes the shards to unallocate and then cannot reallocate. When this happens, we have other nodes that are below the watermark.


(David Turner) #6

This is surprising to me. Shards are not normally deallocated when a node goes above a watermark. If you exceed the low watermark then nothing happens to existing shards; if you exceed the high watermark then shards are moved elsewhere, but they remain allocated on their current node until the relocation is complete; if you exceed the flood stage watermark then the shards are marked as read-only, but they stay allocated to their current node. I'd like to understand the sequence of events that leads from a full disk to an unassigned shard in more detail. Do you have logs of a case where this happened?