Issue with Coordinator node down

softwareklinic · May 22, 2018, 5:53am

We are using Elasticsearch 5.x and in production - we have a Java based application that communicates with Elasticsearch using the TCP transport API -- we are passing 4 coordinator node HOST & PORT while establishing the connection...

But what we observed is that when the 1st node in the LIST (one node) - went down -- none of the clients were able to connect and the entire site was down --- that was dependent on elasticsearch.

Thoughts --- feel free to ask for more details if need be -- appreciate urgent response from anyone who can provide any insights...

Thank you
Keyur

DavidTurner · May 22, 2018, 7:43am

Which version, exactly, are you using?

Are you using sniffing (client.transport.sniff)?

Are you keeping the Elasticsearch Client object alive for an extended period of time, or are you creating a new one each time you need to interact with Elasticsearch?

Are there any interesting log messages? Can you reproduce the problem with DEBUG-level logging within org.elasticsearch.client.transport and provide logs?

softwareklinic · May 22, 2018, 3:23pm

Version - 5.1.2

We are not using sniffing.

Client is alive for extended period of time and we are reusing for all the queries.

Below is the code snippet creating the client.

Settings settings = Settings.builder()
.put(EL_CLUTER_NAME, clusterName)
.build();
TransportClient client = new PreBuiltTransportClient(settings);
for (String coordinatedNode : StringUtils.split(coordinatedNodes, COMMA)) {
String hostName = substringBeforeLast(coordinatedNode, COLON);
String port = substringAfterLast(coordinatedNode, COLON);
client.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(hostName), Integer.valueOf(port)));
}

Our understanding of the sniffing is that once we enable, client will replace coordinated nodes which are provided during the creation with data nodes which it finds using internal cluster state API. Please let us know if this is wrong.

Also what is the use of calling below on the client?

client.connectedNodes(); //We are not doing this but found in other implementations

Will enable DEBUG logs and provide any other log messages.

DavidTurner · May 22, 2018, 4:49pm

Thanks. That rules out a few of the more obvious things that might be happening here.

That's it, yes. I wasn't recommending you do this, by the way, I was just asking because the way that the transport client connects to nodes differs depending on whether sniffing is enabled or not.

This returns the list of connected nodes, but has no other effects.

Looking through the logs is the next step to take. It's worth pointing out that the end-of-life date for 5.1.2 is in less than three weeks, and newer versions have seen changes that might relate to this issue. Upgrading is recommended.

softwareklinic · May 22, 2018, 5:04pm

Will try to get logs for the event timeframe... but still wanted to check... If we are not using Sniff and if we have 4 coordinator nodes... why would just 1 node failing.. cause all the clients querying capabilities compromised...

Any thought in the interim?

TimV · May 23, 2018, 2:12am

That's what we're trying to work out.

A couple of extra questions:

what is the network connection between the client and the coordinating nodes? Is there anything special like firewalls, packet filters, load balancers?
when you say "1st node in the LIST (one node) - went down", what exactly do you mean? Did the underlying machine get shutdown? Was Elasticsearch shutdown gracefully, or did it crash? Did the Elasticsearch process competely stop, or did it hang?

system · June 20, 2018, 2:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Clarify on transport client over coordinating nodes with sniffing enabled Elasticsearch	5	913	November 9, 2017
The indexing or search request send to down node Elasticsearch	11	303	August 24, 2023
Transport client and coordinating nodes Elasticsearch	6	1944	July 12, 2017
Java program failing with None of the configured nodes are available Elasticsearch	8	1020	July 5, 2017
Question about node sniffing in the Java Client Elasticsearch	4	776	July 5, 2017

Issue with Coordinator node down

Related topics