Decommissioning a node using cluster.routing.allocation.exclude._name

linkerc · September 28, 2021, 12:16am

It has been fixed after we shutdown the node to be decommissioned, which caused the replica to become the primary.
Some more background into our experience:
The reason we want to decommission the node was AWS notification about potential HW failure. And I believe the reason shard relocation failed to finish was indeed due to disk drive failure. I encountered IO termination failure message when scp a particular file out of the failing AWS node
as well. This issue could've been avoided if the primary shard was switched to the replica as the first step.

We will upgrade to the latest ASAP. But please take this suggestion into consideration for future releases if it has not been done already.

Once a node being marked as "exclude", the cluster should move all primary shards out of the node immediately.

linked topic: Weird rebalancing strategy - #3 by linkerc

Topic		Replies	Views
Shards refuse to relocate to different nodes using cluster.routing.allocation.exclude Elasticsearch	3	2265	July 13, 2019
Decomissioning node question, does not start moving shards Elasticsearch	3	1843	September 1, 2017
Cluster.routing.allocation.exclude._name not working Elasticsearch	6	4776	January 10, 2019
Unable to decommission nodes from cluster Elasticsearch	5	1178	July 6, 2017
Force shard reallocation Elasticsearch	8	8431	July 5, 2017

Decommissioning a node using cluster.routing.allocation.exclude._name

Related topics