Why doesn't the primary shard switched away from the node being decommissioned?
When I use cluster.routing.allocation.exclude._name to exclude a node from participating the cluster (and be removed eventually), I noticed that I still have primary shard locating in the node preparing to be decommissioned. That shard is marked as "relocating".
Wouldn't it make sense to make the replica shard in different node the primary in such scenario?
That way all the writes will go to one of the other participating nodes instead of the one being excluded?
I'm running version 7.3.1.
I'm seeing longer rebalancing time compare to before, and not sure if this could be the culprit.
It has been fixed after we shutdown the node to be decommissioned, which caused the replica to become the primary.
Some more background into our experience:
The reason we want to decommission the node was AWS notification about potential HW failure. And I believe the reason shard relocation failed to finish was indeed due to disk drive failure. I encountered IO termination failure message when scp a particular file out of the failing AWS node
as well. This issue could've been avoided if the primary shard was switched to the replica as the first step.
We will upgrade to the latest ASAP. But please take this suggestion into consideration for future releases if it has not been done already.
Once a node being marked as "exclude", the cluster should move all primary shards out of the node immediately.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.