Node failed to send join request to master

seowcy · March 5, 2019, 3:24am

Hi,

I have an elasticsearch cluster with 4 nodes (4 windows hosts, each running an ubuntu VM where elasticsearch is running) currently running smoothly without any issues. I am trying to add node 5 to the cluster, but am getting an error.

gist.github.com

https://gist.github.com/seowcy/1143496427e09cc1330576d93b51421e

gistfile1.txt

[2019-03-06T02:22:18,190][INFO ][o.e.e.NodeEnvironment    ] [node_5] using [1] data paths, mounts [[/ (/dev/sda1)]], net usable_space [174.8gb], net total_space [192.7gb], types [ext4]
[2019-03-06T02:22:18,193][INFO ][o.e.e.NodeEnvironment    ] [node_5] heap size [1015.6mb], compressed ordinary object pointers [true]
[2019-03-06T02:22:18,197][INFO ][o.e.n.Node               ] [node_5] node name [node_5], node ID [HUrmoO3eStOS1EwtHnazbw]
[2019-03-06T02:22:18,197][INFO ][o.e.n.Node               ] [node_5] version[6.5.4], pid[2392], build[default/tar/d2ef93d/2018-12-17T21:17:40.758843Z], OS[Linux/4.15.0-43-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_191/25.191-b12]
[2019-03-06T02:22:18,197][INFO ][o.e.n.Node               ] [node_5] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch.QJEAeC5T, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -Xloggc:logs/gc.log, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=32, -XX:GCLogFileSize=64m, -Des.path.home=/home/da/Desktop/elastic/elasticsearch-6.5.4, -Des.path.conf=/home/da/Desktop/elastic/elasticsearch-6.5.4/config, -Des.distribution.flavor=default, -Des.distribution.type=tar]
[2019-03-06T02:22:20,794][INFO ][o.e.p.PluginsService     ] [node_5] loaded module [aggs-matrix-stats]
[2019-03-06T02:22:20,794][INFO ][o.e.p.PluginsService     ] [node_5] loaded module [analysis-common]
[2019-03-06T02:22:20,794][INFO ][o.e.p.PluginsService     ] [node_5] loaded module [ingest-common]
[2019-03-06T02:22:20,794][INFO ][o.e.p.PluginsService     ] [node_5] loaded module [lang-expression]
[2019-03-06T02:22:20,794][INFO ][o.e.p.PluginsService     ] [node_5] loaded module [lang-mustache]

This file has been truncated. show original

Here is the error from the master node.

gist.github.com

https://gist.github.com/seowcy/18cbbd72556b9b105eb627bc5d0efeef

gistfile1.txt

[2019-03-04T19:01:12,558][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [node_2] send message failed [channel: NettyTcpChannel{localAddress=/192.168.10.21:9300, remoteAddress=/192.168.10.24:58352}]
java.nio.channels.ClosedChannelException: null
	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]

I've tried deleting the nodes directory, increasing the minimum number of master nodes from 2 to 3, and just recloning the VM. No luck.

Any advice on how to solve this problem?

Thanks!

DavidTurner · March 5, 2019, 8:53am

The connection attempt timed out:

IOException[connection timed out: 192.168.10.24/192.168.10.24:9300]

This normally indicates a connectivity issue.

The log you quote from the master is dated two days earlier than the other logs and therefore doesn't seem related.

seowcy · March 5, 2019, 9:24am

Thanks for the response.

I thought it was a connectivity issue, but am not sure how to go about troubleshooting it further. The log timing is off because I haven't really bothered to sync the clocks on the VMs, but I can confirm that the log on the master appears when I try to connect the node to the cluster.

I am able to ping to the other nodes in the cluster from the node_5, so I'm not sure what exactly is happening. Any more help would be appreciated.

DavidTurner · March 5, 2019, 10:54am

Not related, but if you have 4 master-eligible nodes then you must set minimum_master_nodes to 3.

Possibly unrelated, but you should know that Elasticsearch performs some optimisations that assume the nodes' clocks are reasonably in sync. It does the right thing even if they're not closely synchronised, but potentially more slowly and/or using more memory.

I note that you're using TLS (because o.e.x.s.t.n.SecurityNetty4ServerTransport). It's possible that TLS requires your clocks to be more accurately synchronised, although if it is this then I think we should report it more loudly.

ping is rather a different thing from the TCP connections that Elasticsearch expects to be able to open. Try nc or telnet. If you can establish basic connectivity using these tools, try starting the new node with this in its elasticsearch.yml file:

logger.org.elasticsearch.transport:TRACE

This should give much more verbose logs.

seowcy · March 6, 2019, 1:51am

After trying out your suggestions, I realised that this node had Symantec installed (but the other nodes do not) so I just had to configure that and the node joined the cluster. Reminder for myself not to assume that all my nodes are necessarily exactly the same. Thanks!

system · April 3, 2019, 1:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting failed to send join request to master Elasticsearch	13	8853	September 6, 2018
Error in cluster deployment: failed to send join request to master Elasticsearch	1	3125	July 18, 2017
Failed to send join request to master? Elasticsearch	1	1134	April 4, 2017
Data nodes failed to send join request to master Elasticsearch	2	863	May 17, 2018
Failed to send join request to master in Elastic 6.4.0 Elasticsearch	16	4826	October 8, 2018

Node failed to send join request to master

Related topics