Checking cluster node availability

Arcimboldo · June 25, 2015, 2:24pm

Hi,

we cannot use one of the proper Java clients because we're currently stuck with Java 6. For our circuit-breaking node selector, we want to find out if a crashed or otherwise unavailable node has become available again in as inexpensive a fashion as possible. Do other nodes in an Elasticsearch cluster have an idea of a node's current availability? I haven't found that kind of information where I expected it most - in cluster health or node stats/state...

Thanks and best regards
Heiko

magnusbaeck · June 25, 2015, 5:26pm

If the node is listed in e.g. the result of e.g. a /_nodes or /_cluster/state request it's part of the cluster and is healthy as far as the cluster knows. Or am I misunderstanding your question?

Arcimboldo · June 26, 2015, 8:59am

I think we're talking of the same thing. Do you know how up-to-date cluster knowledge is? In other words, if another cluster node lists the node to be checked in, for example, the "_nodes" result how long ago did it usually try to communicate with that node?
Right now, I've implemented a kind of "ping" where I issue a "_nodes//network" request and am happy if it answers at all, not caring for the exact result - the point was to use a request that is inexpensive and does not need any preparation on our side but generates the usual java.net exceptions like SocketTimeoutException if the problem remains. What we're interested in is just whether the node is reachable again, so to really check cluster health could be too strict a condition.
However, of course what we don't take into account with this approach, I could imagine, is network partitioning, or split-brain - the node could be reachable but still not be part of the cluster. Our check should just be a necessary condition to decide whether to give the node another try with a "real" request - it need not be sufficient in the mathematical sense of the word. Still, would you rather recommend your approach to us, it being "more sufficient" than ours?
ThanksHeiko

Topic		Replies	Views
Detect ES cluster node join/leave Elasticsearch	2	340	July 6, 2017
ES node client Elasticsearch	4	561	July 4, 2017
Other ways to check cluster status? 9200 not responding, web UI mostly blank Elasticsearch	4	875	July 6, 2017
Test the fault tolerant mechanism for ES cluster nodes Elasticsearch	16	594	February 5, 2020
Is is possible to have elasticsearch status return "running" but to get "no alive nodes found in cluster" for the same app? Elasticsearch	1	154	September 8, 2023

Checking cluster node availability

Related topics