Take node out of cluster without affecting future node joining

I deployed an ES cluster to an OpenShift/Kubernetes cluster. I want to enable OpenShift's auto-scaling on ES but need to control node reduction rate to prevent partial data loss. OpenShift supports preStop hook which is called and waited for completion before shutting down the node/pod. I can take node out of cluster with shard allocation filtering in this preStop hook:

curl -XPUT localhost:9200/_cluster/settings -d '{
  "transient" :{
      "cluster.routing.allocation.exclude._ip" : "10.0.0.1"
   }
}'

However there are still 2 issues:

  1. the ip I specified can be re-assigned later on to a new to-be joined ES node. The nodes are cattle not pets so name, id, ip are all ephemeral. how to prevent the routing filter from affecting future node joining?
  2. There can be more than 1 nodes being shutting down at the same time. How to prevent those concurrent PUT requests overwriting each other?

What I am looking for is a way to notify the cluster a node is leaving. The cluster will react but doesn't record the request, even transiently. A later joined node, if matching the routing filter criteria, will automatically remove the routing filter when joining.

What version of Elasticsearch are you using? In ES 5.0+ persistent node ids are used, so id should be both unique (per node) and persistent (if you restart the same node accidentally)

There can be more than 1 nodes being shutting down at the same time. How to prevent those concurrent PUT requests overwriting each other?

There's no built-in way of doing this (validating a get-then-update was not changed underneath you), but you could exclude both nodes at once by comma-separating the values.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.