I'm trying to remove a server from our ElasticSearch cluster, and in order to do so, I would like to disable allocation from that server/node, and once all the shards have been moved, to shut the server down.
I disabled allocation of shards for that node using the cluster.routing.allocation.exclude._name setting. The node has several smaller indices that successfully moved off, however, the shards for the larger indices (about 160 GB) have not moved for over a day. When examining the shard stats using Cerebro, it appears that the relocating_node constantly changes, without settling on a particular node and moving. The index in question has 5 shards, with a 2 replicas. There are currently 10 nodes in the cluster.
I am wondering if this is a bug, or whether there is some configuration setting I need to set so that the shard relocation doesn't keep on thrashing. I'm not sure where to look to see why it's repeatedly deciding to change relocating_node when all I want it to do is move off the node in question.
I am running Elasticsearch 6.2.4 if that helps. It'd be useful to know if this has been fixed in a later version.