Should GatewayAllocator force allocate replicas for node/process restart cases?

vigyas · May 15, 2020, 7:32pm

Today, we can exclude certain nodes from allocation using FilterDecider settings, which is a way to decommission nodes on a rack, or with an attribute.

If ES process restarts on such a node, the primaries get assigned because GatewayAllocator can call canForceAllocatePrimary() and get a YES decision when possible.

However, replica shards have no such mechanism. This causes yellow clusters and makes it vulnerable to data loss due to under-replication. Since primaries are already being moved out of the shard, the new replica recovery is often throttled due to that node's outgoing recovery limits; which causes cluster to stay yellow for some time (increasing window of low durability).

Should ES have a mechanism similar to force allocating primaries, for allocating replica shards as well. This would only apply to in-sync shard copies, and only for nodes that were unassigned due to node left or cluster recovered.

system · June 12, 2020, 7:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Swapping primary and replica shard allocations in two node clusters Elasticsearch	5	3633	June 11, 2019
Reuse of replicas on cluster restart Elasticsearch	6	404	July 6, 2017
What happen to replicas after rolling restart? Elasticsearch	12	1798	July 5, 2017
Restarting nodes with allocation disabled Elasticsearch	2	3486	July 6, 2017
Disable cluster rebalancing for replicas Elasticsearch	2	1849	July 6, 2017

Should GatewayAllocator force allocate replicas for node/process restart cases?

Related topics