Should GatewayAllocator force allocate replicas for node/process restart cases?

Today, we can exclude certain nodes from allocation using FilterDecider settings, which is a way to decommission nodes on a rack, or with an attribute.

If ES process restarts on such a node, the primaries get assigned because GatewayAllocator can call canForceAllocatePrimary() and get a YES decision when possible.

However, replica shards have no such mechanism. This causes yellow clusters and makes it vulnerable to data loss due to under-replication. Since primaries are already being moved out of the shard, the new replica recovery is often throttled due to that node's outgoing recovery limits; which causes cluster to stay yellow for some time (increasing window of low durability).

Should ES have a mechanism similar to force allocating primaries, for allocating replica shards as well. This would only apply to in-sync shard copies, and only for nodes that were unassigned due to node left or cluster recovered.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.