We have a 27 data node ES 5.5 cluster with 1.7TB of disk per node. The ES team has created an amazingly easy to use and high performance system. Our cluster normally works beautifully. It ingests over a billion logs per day.
We're trying to run the cluster with less free space than we've maintained in the past (all nodes currently have between 13% and 15% free) and have run into a problem.
Occasionally one node will become imbalanced with a few large, active shards and its free space will drop to under 5%. That triggers disk high water shard relocations. ES will schedule too many moves off this node and cause more high water relocations on other nodes. This situation can cascade out of control like a run-away reactor.
Our shard sizes vary with some indices having shards under 0.1gb, but others around 40gb. 2672 active shards use 40TB of disk. In order to get reasonable allocation for diverse shard sizes, we set cluster.routing.allocation.balance.shard to 0 so that an index's shards are evenly spread across nodes. (This also greatly reduced hot spots in the cluster.)
We'd like to have ES perform the minimal set of relocations necessary to bring a node back above the high water.
We'd also like a variant of BalancedShardsAllocator that prefers nodes with free disk space so that shards are preferentially allocated to nodes with more disk.
When a node runs out of disk it would be great to have a small number of shards move from there to the nodes with the most disk.
Has anyone else seen this behavior? Are there any cluster configurations that would help eliminate these relocation storms?
Does anyone know of work being done to make balancing more disk aware? Are there ES versions that have better behavior than 5.5?
Thanks in advance!