Undo decommissioning of a node

loren · January 16, 2018, 8:18pm

On another much smaller 1.7 cluster, I was able to exclude a node, watch it drain, set exclude._name to "", and then see that shards immediately got relocated to that node. So that is apparently not the problem.

On the other two larger clusters (400 indices, 3000 shards, 12 nodes), I think there's just something else going on there, possibly due to the forced awareness of AZ's and/or possibly due to the sheer size of the cluster in terms of shard count. Like, even with 12 data nodes and everything pretty balanced AFAICT, one cluster is still going nuts relocating/replicating shards. And it's not all in one direction either. Sometimes a node gets a few hundred GBs of data only to have another few hundred GBs moved away immediately after. It's unclear to me how the rebalancing is getting determined, or how long it will take to quiesce.

At any rate, this seems to be an issue around rebalancing heuristics and not shard allocation filtering.

Topic		Replies	Views
Unable to decommission nodes from cluster Elasticsearch	5	1215	July 6, 2017
Cluster.routing.allocation.exclude._name not working Elasticsearch	6	5025	January 10, 2019
Shrinking a large cluster Elasticsearch	1	339	September 25, 2019
Cluster Filtering is not showing any results Elasticsearch	3	348	October 7, 2020
Removing a node Elasticsearch	2	449	July 31, 2018

Undo decommissioning of a node

Related topics