Getting ConnectTimeoutException When joining in cluster Even if nodes are reachable

I have elasticsearch deployed on 2 node deployment setup (node1, node2) on remote machine & using the Java API to connect to it. Originally both nodes were in different clusters, and each of them is master node. Then we make sure in elasticsearch.yml files - discovery.zen.ping.unicast.hosts contains hostnames of node1 and node2, and restarted elasticsearch on both nodes. Now when restarting elasticsearch on node2, it is running but not able to connect to node1 and form a new cluster. I'm seeing the following ConnectTimeoutException.

[WARN ][o.e.d.z.ZenDiscovery     ] [rqnr5CF] failed to connect to master [{AhMmXxh}{AhMmXxhBRTGvK0DyD-CMuQ}{b2m73mOjQyi9xug0abOK9w}{node1 hostname}{node1 ip:9300}], retrying...
org.elasticsearch.transport.ConnectTransportException: [AhMmXxh][node1 ip:9300] connect_timeout[30s]
        at org.elasticsearch.transport.netty4.Netty4Transport.connectToChannels(Netty4Transport.java:361) ~[?:?]
        at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:548) ~[elasticsearch-[version].jar:[version]]
        at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:472) ~[elasticsearch-[version].jar:[version]]
        at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:332) ~[elasticsearch-[version].jar:[version]]
        at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:319) ~[elasticsearch-[version].jar:[version]]
        at org.elasticsearch.discovery.zen.ZenDiscovery.joinElectedMaster(ZenDiscovery.java:459) [elasticsearch-[version].jar:[version]]
        at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:411) [elasticsearch-[version].jar:[version]]
        at org.elasticsearch.discovery.zen.ZenDiscovery.access$4100(ZenDiscovery.java:83) [elasticsearch-[version].jar:[version]]
        at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1188) [elasticsearch-[version].jar:[version]]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-[version].jar:[version]]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: {node1 hostname}/{node1 ip}.92:9300
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:267) ~[?:?]
        at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) ~[?:?]
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127) ~[?:?]
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) ~[?:?]
        ... 1 more

However, both nodes are reachable from each other using telnet in port 9300. So nodes are reachable but ES throws the exception.

Can anyone tell me the reason why this is happening? Thanks!

Did you wipe the data directory of one of the nodes? I believe the cluster state contains the cluster name so changing it in the config is not necessariloy sufficient.

Thank you Christian for the reply, we also have few more clarifications,

1."cluster state contains the cluster name".
How can we check and confirm this inside data folder ? do we have any steps to check this.? any curl command is there to check it ?

2." Did you wipe the data directory of one of the nodes?"
which node's data folder we have to clear ? node2 is not able to connect to node1's cluster, so clearing node2 would be fine or do we have to clear both the node1, node2 as well ?

3.Once we cleared the data folder, how this data folder will be generated back ? will this automatically be re-generated on restart of elastic search services ? if not, do we have to manually copy the node1 data folder into another node (node2) will help us to fix this problem here?

Thanks In advance..!

Christian,
any updates on this query would make us to proceed further.
Kindly do the needful.

Thanks,

I believe any new node that is added to an existing cluster must be empty and not have been part of a cluster ster before. The data in the data path will be generated when the node joins the cluster. If the node has data that you wish to keep you should create a snapshot with this using the snapshot API so you later can import this to the new cluster.

thanks Christian for the update.
We will try it out and let you know incase of any further clarifiactions.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.