We have a fleet of approximately 200 nodes. We use ES 1.7.4 version.
Watermark.low equals to 80% and high to 85%.
Around 5-8% of the fleet has Disk Usage between 80% and 85% and 5% of nodes - around 20-30% of DU. When a node with ~79-83% Disk Usage goes down and then up, shards are not getting allocated back to the node and are getting spread across the fleet. Current delayed allocation timeout is 10 minutes.
So it looks like when a node comes back, ES tries to add a biggest shard from that node back to it, but it sums up the current disk usage with that shard size, which exceeds low watermark threshold and doesn't assign anything back to the node.
Is it the expected behavior? I thought ES shouldn't sum up the Disk Usage with a shard size which is already part of a Disk Usage, should it?
Thanks!