Determine Node is safe to Terminate

I'm trying to determine if a node is safe to be terminated after using a routing exclusion. Is the only way to do this to traverse the cluster state and be sure it doesn't have any shards left? I see that the Monitoring UI knows how many shards are on a particular node, but I'm unable to find that information in the API. Thanks!

GET _cluster/health?wait_for_no_relocating_shards&timeout=24h waits for all relocations to finish, up to 24 hours. Or you can poll GET _cat/allocation/NODE_NAME - the first number returned is the number of shards on that node.

Thanks @DavidTurner this is very helpful. Can I rely on the allocations or is there a chance I could catch it between one finishing and the next starting? I may use the cat api for allocations as that will let me poll and do other work between. Using both would give me more peace of mind though. I’m torn.
Thanks so much for your help/advice.

There is no gap between the end of one relocation and the start of the next, but if there is a shard that cannot be relocated (e.g. it has an inconsistent set of allocation filters that are not satisfied on any node) then it will remain where it is, so I think a final check of GET _cat/allocation/NODE_NAME is a good idea even if using the cluster health API to wait.

@DavidTurner makes sense. Thanks again!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.