Cluster unstable. Communication problem within cluster

elk11 · May 17, 2019, 6:37pm

Hi,

I have a 5 node elasticsearch cluster (7.0.0). Nodes randomly leave the cluster and join back. This is happening with most of the nodes.

The logs on the nodes look something like this:

failed to connect to node {master-node}{eRSnrCA-QQ6v0CTfQzwSGw}{uCbIdgAPQqiIkke7eBu4pg}{master-node}{x.x.x.x:9300} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [master-node][x.x.x.x:9300] connect_exception
        at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1299) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:99) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-7.0.0.jar:7.0.0]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2159) ~[?:?]
        at org.elasticsearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:57) ~[elasticsearch-core-7.0.0.jar:7.0.0]
        at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$new$1(Netty4TcpChannel.java:72) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424) ~[?:?]
        at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) ~[?:?]
        at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: master-node/x.x.x.x:9300
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
        ... 6 more
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
        ... 6 more
master node changed {previous [], current [{master-node}{eRSnrCA-QQ6v0CTfQzwSGw}{uCbIdgAPQqiIkke7eBu4pg}{master-node}{x.x.x.x:9300}]}, term: 7, version: 49163, reason: ApplyCommitRequest{term=7, version=49163, sourceNode={master-node}

elk11 · May 21, 2019, 1:27pm

I restarted all nodes on the cluster. That seems to have solved the problem.

system · June 18, 2019, 1:27pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Make cluster more resilient to network failures - how? Elasticsearch	3	1176	September 15, 2017
Nodes leave and then rejoin the cluster randomly Elasticsearch	1	704	June 13, 2019
Elasticsearch nodes continually disconneting/reconnecting. Resulting in very high number of unassigned shards Elasticsearch	18	2657	September 3, 2020
Nodes keep disconnecting from cluster at random Elasticsearch	8	2791	January 4, 2018
Elastic cluster issue Elasticsearch	8	727	July 5, 2017

Cluster unstable. Communication problem within cluster

Related topics