Decommissioning a node using cluster.routing.allocation.exclude._name

linkerc · September 22, 2021, 10:01pm

Why doesn't the primary shard switched away from the node being decommissioned?
When I use cluster.routing.allocation.exclude._name to exclude a node from participating the cluster (and be removed eventually), I noticed that I still have primary shard locating in the node preparing to be decommissioned. That shard is marked as "relocating".
Wouldn't it make sense to make the replica shard in different node the primary in such scenario?
That way all the writes will go to one of the other participating nodes instead of the one being excluded?

I'm running version 7.3.1.
I'm seeing longer rebalancing time compare to before, and not sure if this could be the culprit.

warkolm · September 27, 2021, 12:35am

7.3 is well past EOL, please urgently upgrade.

What is the current state of the reallocation?

linkerc · September 28, 2021, 12:16am

It has been fixed after we shutdown the node to be decommissioned, which caused the replica to become the primary.
Some more background into our experience:
The reason we want to decommission the node was AWS notification about potential HW failure. And I believe the reason shard relocation failed to finish was indeed due to disk drive failure. I encountered IO termination failure message when scp a particular file out of the failing AWS node
as well. This issue could've been avoided if the primary shard was switched to the replica as the first step.

We will upgrade to the latest ASAP. But please take this suggestion into consideration for future releases if it has not been done already.

Once a node being marked as "exclude", the cluster should move all primary shards out of the node immediately.

linked topic: Weird rebalancing strategy - #3 by linkerc

system · October 26, 2021, 12:16am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shards refuse to relocate to different nodes using cluster.routing.allocation.exclude Elasticsearch	3	2208	July 13, 2019
Decomissioning node question, does not start moving shards Elasticsearch	3	1839	September 1, 2017
Cluster.routing.allocation.exclude._name not working Elasticsearch	6	4690	January 10, 2019
Unable to decommission nodes from cluster Elasticsearch	5	1174	July 6, 2017
Force shard reallocation Elasticsearch	8	8296	July 5, 2017

Decommissioning a node using cluster.routing.allocation.exclude._name

Related topics