Shards refuse to relocate to different nodes using cluster.routing.allocation.exclude

Hi,

I'm trying to remove a server from our ElasticSearch cluster, and in order to do so, I would like to disable allocation from that server/node, and once all the shards have been moved, to shut the server down.

I disabled allocation of shards for that node using the cluster.routing.allocation.exclude._name setting. The node has several smaller indices that successfully moved off, however, the shards for the larger indices (about 160 GB) have not moved for over a day. When examining the shard stats using Cerebro, it appears that the relocating_node constantly changes, without settling on a particular node and moving. The index in question has 5 shards, with a 2 replicas. There are currently 10 nodes in the cluster.

I am wondering if this is a bug, or whether there is some configuration setting I need to set so that the shard relocation doesn't keep on thrashing. I'm not sure where to look to see why it's repeatedly deciding to change relocating_node when all I want it to do is move off the node in question.

I am running Elasticsearch 6.2.4 if that helps. It'd be useful to know if this has been fixed in a later version.

You said smaller indices successfully moved off but the larger indices repetedly change relocating_node. I suspect other nodes don't have enough disk space to accpet shards of this large indice. There should be some more information , you can check it in elasticsearch.log

I have solved this.

It looks like the machines that it is attempting to move to (which are new) are failing to allocate the shards because they are missing an on disk file necessary for creating the index (this is particular to our mapping file). This is causing ES to thrash with allocation errors. I didn't see that until I checked the logs on the master, since the target machines didn't seem to be reporting the errors

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.