Today, we can exclude certain nodes from allocation using FilterDecider
settings, which is a way to decommission nodes on a rack, or with an attribute.
If ES process restarts on such a node, the primaries get assigned because GatewayAllocator
can call canForceAllocatePrimary()
and get a YES
decision when possible.
However, replica shards have no such mechanism. This causes yellow clusters and makes it vulnerable to data loss due to under-replication. Since primaries are already being moved out of the shard, the new replica recovery is often throttled due to that node's outgoing recovery limits; which causes cluster to stay yellow for some time (increasing window of low durability).
Should ES have a mechanism similar to force allocating primaries, for allocating replica shards as well. This would only apply to in-sync shard copies, and only for nodes that were unassigned due to node left
or cluster recovered
.