I am in the process of shrinking a 1.7 cluster as indices get moved over to a 6.1 cluster. I shrank it from 35 nodes to 8 using shard allocation filtering, and this worked just great. I then specified another 3 nodes for draining, but halfway through I realized we still needed them for a few more days. How to undo this?
Based on this and this, I tried to un-decommission them by doing curl -XPUT 'localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d' { "transient" : { "cluster.routing.allocation.exclude._name" : "" } } '
but this didn't stop the shards from being relocated. Then I tried filtering against a nonsense name, but that didn't work either: curl -XPUT 'localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d' { "transient" : { "cluster.routing.allocation.exclude._name" : "halp" } } '
Then I tried doing a full restart of the cluster, bringing all nodes down, to reset this transient setting. It's still trying to drain those 3 nodes.
The only way I've found to stop it is to set some artificially high disk watermarks, but there has got to be a better way even on ES1.7.
How can I stop this madness?