Failed to send join request to master, discovery timed out

Hey!

I'm working on setting up a cluster and have run into a problem which I hope you could help me with.
The cluster is set up with 2 master nodes (master-1 and master-2) running on the same server, then a data node on another server.
The two master nodes are working fine creating the cluster and selecting a master, but the data node fails to "send join request to master". It logs the name of the selected master so I guess it can connect to the other server?

Have I missed something obvious here or could it still be some network issue? Grateful for any help or pointers in the right direction!

Included the relevant configs and log below:

master-1 config:

 cluster.name: cluster1
 node.name: master-1
 network.host: ["_local_", 145.xxx.yyy.z2]
 network.publish_host: 145.xxx.yyy.z2
 http.port: 9200
 discovery.zen.ping.unicast.hosts: ["145.xxx.yyy.z2:9300", "145.xxx.yyy.z2:9301"]
 discovery.zen.minimum_master_nodes: 2
 node.data: true
 node.master: true

master-2 config:

cluster.name: cluster1
node.name: master-2
network.host: ["_local_", 145.xxx.yyy.z2]
network.publish_host: 145.xxx.yyy.z2
http.port: 9201
discovery.zen.ping.unicast.hosts: ["145.xxx.yyy.z2:9300", "145.xxx.yyy.z2:9301"]
discovery.zen.minimum_master_nodes: 2
node.data: true
node.master: true

data-1 config:

cluster.name: cluster1
node.name: data-1
network.host: ["_local_", 145.zzz.yyy.x6]
network.publish_host: 145.xxx.yyy.x6
http.port: 9200
discovery.zen.ping.unicast.hosts: ["145.xxx.yyy.z2:9300", "145.xxx.yyy.z2:9301"]
discovery.zen.minimum_master_nodes: 2
node.data: true
node.master: false

data-1 log:

[2017-11-16T16:15:52,304][INFO ][o.e.n.Node               ] [data-1] initializing ...
[2017-11-16T16:15:52,414][INFO ][o.e.e.NodeEnvironment    ] [data-1] using [1] data paths, mounts [[(F:)]], net usable_space [24.8gb], net total_space [24.9gb], types [NTFS]
[2017-11-16T16:15:52,414][INFO ][o.e.e.NodeEnvironment    ] [data-1] heap size [990.7mb], compressed ordinary object pointers [true]
[2017-11-16T16:15:52,414][INFO ][o.e.n.Node               ] [data-1] node name [data-1], node ID [EiurCDyGRS6IrhmaXI6Pjw]
[2017-11-16T16:15:52,414][INFO ][o.e.n.Node               ] [data-1] version[6.0.0], pid[884], build[8f0685b/2017-11-10T18:41:22.859Z], OS[Windows Server 2012 R2/6.3/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_144/25.144-b01]
[2017-11-16T16:15:52,414][INFO ][o.e.n.Node               ] [data-1] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -Delasticsearch, -Des.path.home=F:\elasticsearch-6.0.0, -Des.path.conf=F:\elasticsearch-6.0.0\config, exit, -Xms1024m, -Xmx1024m, -Xss1024k]
[2017-11-16T16:15:53,429][INFO ][o.e.p.PluginsService     ] [data-1] loaded module [aggs-matrix-stats]
...
[2017-11-16T16:15:53,429][INFO ][o.e.p.PluginsService     ] [data-1] no plugins loaded
[2017-11-16T16:15:54,961][INFO ][o.e.d.DiscoveryModule    ] [data-1] using discovery type [zen]
[2017-11-16T16:15:55,695][INFO ][o.e.n.Node               ] [data-1] initialized
[2017-11-16T16:15:55,695][INFO ][o.e.n.Node               ] [data-1] starting ...
[2017-11-16T16:15:55,992][INFO ][o.e.t.TransportService   ] [data-1] publish_address {145.zzz.yyy.x6:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300}, {145.zzz.yyy.x6:9300}
[2017-11-16T16:15:56,007][INFO ][o.e.b.BootstrapChecks    ] [data-1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-11-16T16:16:20,196][INFO ][o.e.d.z.ZenDiscovery     ] [data-1] failed to send join request to master [{master-1}{JNyN2w61QOaaPNXgxPc0eQ}{S8Ugwd7kSeS-9xUSL_SiDg}{145.xxx.yyy.z2}{145.xxx.yyy.z2:9300}], reason [RemoteTransportException[[master-1][145.xxx.yyy.z2:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[data-1][145.zzz.yyy.x6:9300] connect_timeout[30s]]; nested: IOException[Connection timed out: no further information: 145.zzz.yyy.x6/145.zzz.yyy.x6:9300]; nested: IOException[Connection timed out: no further information]; ]
[2017-11-16T16:16:26,055][WARN ][o.e.n.Node               ] [data-1] timed out while waiting for initial discovery state - timeout: 30s
[2017-11-16T16:16:26,118][INFO ][o.e.h.n.Netty4HttpServerTransport] [data-1] publish_address {145.zzz.yyy.x6:9200}, bound_addresses {127.0.0.1:9200}, {[::1]:9200}, {145.zzz.yyy.x6:9200}
[2017-11-16T16:16:26,118][INFO ][o.e.n.Node               ] [data-1] started
[2017-11-16T16:16:44,243][INFO ][o.e.d.z.ZenDiscovery     ] [data-1] failed to send join request to master [{master-1}{JNyN2w61QOaaPNXgxPc0eQ}{S8Ugwd7kSeS-9xUSL_SiDg}{145.xxx.yyy.z2}{145.xxx.yyy.z2:9300}], reason [RemoteTransportException[[master-1][145.xxx.yyy.z2:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[data-1][145.zzz.yyy.x6:9300] connect_timeout[30s]]; nested: IOException[Connection timed out: no further information: 145.zzz.yyy.x6/145.zzz.yyy.x6:9300]; nested: IOException[Connection timed out: no further information]; ]

Thanks!

It looks like the data node can connect to the master nodes but the reverse connection does not work. Can you check on the master node machine whether data-1 is reachable under 145.zzz.yyy.x6:9300 ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.