Java application disconnects from Elasticsearch cluster

Affan_Malik · March 2, 2018, 8:51pm

We have a 3 node Elasticsearch cluster behind a AWS ELB. Our Java application communicates to the cluster using the Elasticsearch Java client pointing to the ELB.

We have been noticing that the application intermittently loses access to the cluster. There is no repeatable pattern to the disconnect, nor does the log messages on either end show any meaningful information on why the disconnect happened.

Typical log message in ES client (application) logs:
[ INFO] 2018-02-25 03:53:22,758 org.elasticsearch.client.transport - [Amiko Kobayashi] failed to get node info for [#transport#-1][localhost][inet[indexer.xxx.xxx.xxx.xxx/10.78.5.137:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[indexer.xxx.xxx.xxx.xxx/10.78.5.137:9300]][cluster:monitor/nodes/info] request_id [151663] timed out after [5000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Typical Log message in the ES nodes logs:
[2018-02-28 21:45:14,589][TRACE][transport.netty ] [xxx.xxx.xxx.xxx.xxx.xxx.xxx.xxx] close connection exception caught on transport layer [[id: 0xa52121cb, /10.78.5.120:19418 :> /10.78.5.115:9300]], disconnecting from relevant node
java.nio.channels.ClosedChannelException
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:433)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:373)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Also worth noting that this issue has happened at idle time (with very little data in the cluster) and also happened when significant amount of data is being ingested. So it does not seem like load related.

Any ideas on which areas to focus investigation would be much appreciated.

thiago · March 3, 2018, 10:09pm

If you mean that you are using the TransportClient to connect to the cluster through an AWS ELB, then this is not recommended and not really needed.

The TransportClient creates persistent connections to other nodes and AWS ELB will eventually cut idle connection.

Also, using an external LB is not really needed in this case since the TransportClient is an LB on its own.

richard.lo · March 5, 2018, 7:17pm

Hi,

To follow up on this, we're seeing timeouts from Elasticsearch after 5000ms, which is not even close to the idle timeout on the AWS ELB configuration (1800s).

We're also looking into using the AWS NLB (https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) which is a Layer 4 network load balancer, is there any opinion on this or is the general consensus that any type of load balancer can cause issues ?

thiago · March 5, 2018, 7:21pm

Generally speaking, we don't recommend using L4 LB for transport connection.

Also, keep in mind that the TransportClient has been deprecated and our recommendation is to switch to the High Level REST Client in which you can use an HTTP LB.

richard.lo · March 5, 2018, 7:25pm

Thanks, that helps a lot!

system · April 2, 2018, 7:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting disconnecting "TimeoutTransportException" exception in java client API Elasticsearch	1	364	June 8, 2018
Elasticsearch cluster with AWS ELB randomly disconnects from applications Elasticsearch	8	1350	July 10, 2018
Java TransportClient problem Elasticsearch	1	2306	July 6, 2017
Random node disconnects - Java.io.IOException: Connection timed out Elasticsearch	2	5419	July 5, 2017
Java client, gets disconected after a while Elasticsearch	1	371	July 6, 2017

Java application disconnects from Elasticsearch cluster

Related topics