Checking cluster node availability



we cannot use one of the proper Java clients because we're currently stuck with Java 6. For our circuit-breaking node selector, we want to find out if a crashed or otherwise unavailable node has become available again in as inexpensive a fashion as possible. Do other nodes in an Elasticsearch cluster have an idea of a node's current availability? I haven't found that kind of information where I expected it most - in cluster health or node stats/state...

Thanks and best regards

(Magnus B├Ąck) #2

If the node is listed in e.g. the result of e.g. a /_nodes or /_cluster/state request it's part of the cluster and is healthy as far as the cluster knows. Or am I misunderstanding your question?


I think we're talking of the same thing. Do you know how up-to-date cluster knowledge is? In other words, if another cluster node lists the node to be checked in, for example, the "_nodes" result how long ago did it usually try to communicate with that node?
Right now, I've implemented a kind of "ping" where I issue a "_nodes//network" request and am happy if it answers at all, not caring for the exact result - the point was to use a request that is inexpensive and does not need any preparation on our side but generates the usual exceptions like SocketTimeoutException if the problem remains. What we're interested in is just whether the node is reachable again, so to really check cluster health could be too strict a condition.
However, of course what we don't take into account with this approach, I could imagine, is network partitioning, or split-brain - the node could be reachable but still not be part of the cluster. Our check should just be a necessary condition to decide whether to give the node another try with a "real" request - it need not be sufficient in the mathematical sense of the word. Still, would you rather recommend your approach to us, it being "more sufficient" than ours?

(system) #4