Elasticsearch Node failed to send join request to master

Elasticsearch version 6.2.4
Cluster: 46 Nodes, Master: host1-0
Cluster is running for 4 days without any problem.
About 10 hour ago, one data Node: host9-0 fail to ping master
After that, host9-0 keeps send join request to master but get "a channel closed while connecting" error.

Here is the log (if needed, I can provide more logs)

Blockquote
[2019-07-30T14:45:38,919][INFO ][o.e.d.z.ZenDiscovery ] [host9-0] master_left [{host1-0}{XnTUIM0gRGuYPDo07kb7rw}{WhOIwzlvTfuU3O3mAxq88A}{elasticsearch-host1-a}{10.50.55.143:9300}{rack_id=host1}], reason [failed to ping, tried [3] times, each with maximum [1m] timeout]
[2019-07-30T14:45:38,985][WARN ][o.e.d.z.ZenDiscovery ] [host9-0] master left (reason = failed to ping, tried [3] times, each with maximum [1m] timeout), current nodes: nodes:
...
[2019-07-30T14:45:39,166][WARN ][o.e.c.NodeConnectionsService] [host9-0] failed to connect to node {host4-0}{zEqozy7ARYaH9gk5vyCGDg}{udD_q1b5Tz-EEg7Izx4y7g}{elasticsearch-host4-a}{10.50.52.174:9300}{rack_id=host4, box_type=hot} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [host4-0][10.50.52.174:9300] a channel closed while connecting
at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:652) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:513) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:331) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:318) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.cluster.NodeConnectionsService.validateAndConnectIfNeeded(NodeConnectionsService.java:154) [elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.cluster.NodeConnectionsService$ConnectionChecker.doRun(NodeConnectionsService.java:183) [elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) [elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.2.4.jar:6.2.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_91]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_91]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]
[2019-07-30T14:45:39,238][WARN ][c.f.s.s.t.SearchGuardSSLNettyTransport] [host9-0] exception caught on transport layer [NettyTcpChannel{localAddress=/10.50.53.114:42702, remoteAddress=elasticsearch-host1-d/10.50.55.180:9300}], closing connection
java.lang.IllegalArgumentException: null
at java.nio.Buffer.position(Buffer.java:244) ~[?:1.8.0_91]
at io.netty.buffer.PooledHeapByteBuf.setBytes(PooledHeapByteBuf.java:261) ~[netty-buffer-4.1.16.Final.jar:4.1.16.Final]
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106) ~[netty-buffer-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343) ~[netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.16.Final.jar:4.1.16.Final]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]
[2019-07-30T14:45:39,358][WARN ][i.n.c.AbstractChannelHandlerContext] Failed to mark a promise as failure because it has succeeded already: DefaultChannelPromise@61947277(success)
java.lang.IndexOutOfBoundsException: null
at java.nio.ByteBuffer.wrap(ByteBuffer.java:375) ~[?:1.8.0_91]
at io.netty.buffer.PooledHeapByteBuf.nioBuffer(PooledHeapByteBuf.java:300) ~[netty-buffer-4.1.16.Final.jar:4.1.16.Final]
......
Jul 31 00:27:50 psr-169-56 docker[3817]: [2019-07-31T00:27:50,978][WARN ][c.f.s.s.t.SearchGuardSSLNettyTransport] [host9-0] exception caught on transport layer [NettyTcpChannel{localAddress=/10.50.53.114:9300, remoteAddress=/10.50.55.143:50974}], closing connection
Jul 31 00:27:50 psr-169-56 docker[3817]: java.lang.IllegalArgumentException: null
Jul 31 00:27:50 psr-169-56 docker[3817]: at java.nio.Buffer.position(Buffer.java:244) ~[?:1.8.0_91]
......
Jul 31 00:27:51 psr-169-56 docker[3817]: at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.16.Final.jar:4.1.16.Final]
Jul 31 00:27:51 psr-169-56 docker[3817]: at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]
Jul 31 00:27:51 psr-169-56 docker[3817]: [2019-07-31T00:27:51,207][INFO ][o.e.d.z.ZenDiscovery ] [host9-0] failed to send join request to master [{host1-0}{XnTUIM0gRGuYPDo07kb7rw}{WhOIwzlvTfuU3O3mAxq88A}{elasticsearch-host1-a}{10.50.55.143:9300}{rack_id=host1}], reason [RemoteTransportException[[host1-0][10.50.55.143:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[host9-0][10.50.53.114:9300] a channel closed while connecting]; ]

If you do a telnet host1-0 9200 from host9-0 does that work?

Yes, both 9200 and 9300 work.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.