Safely repair (remove/add-back) a data node in cluster

we have an Elasticsearch cluster running with a few data nodes. Recently one of the data nodes have some disk issues and we'll fix it.

To preserve indexes, I've excluded the faulty data node out from cluster, with

curl -XPUT P.P.P.P:9200/_cluster/settings -H 'Content-Type: application/json' -d '{
  "transient" :{
      "cluster.routing.allocation.exclude._ip" : "X.X.X.X"
   }
}';echo

refers to the link How to remove node from elasticsearch cluster on runtime without down time - Stack Overflow.

The I shutdown the data node, fixed the disk, and power it on.

The problem is that, after the machine is on, it fails to add back into cluster.

Is it because of the above transient excluding setting? I tried similar setting, just replace exclude with include but had no effect.

If there a way to disable/delete the transient setting above, thanks

This could mean a number of different things but it's hard to help without knowing which it is. What does this mean exactly? Any relevant logs? Errors? API outputs?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.