Getting ConnectTimeoutException When joining in cluster Even if nodes are reachable

ziw028 · August 19, 2021, 5:57pm

I have elasticsearch deployed on 2 node deployment setup (node1, node2) on remote machine & using the Java API to connect to it. Originally both nodes were in different clusters, and each of them is master node. Then we make sure in elasticsearch.yml files - discovery.zen.ping.unicast.hosts contains hostnames of node1 and node2, and restarted elasticsearch on both nodes. Now when restarting elasticsearch on node2, it is running but not able to connect to node1 and form a new cluster. I'm seeing the following ConnectTimeoutException.

[WARN ][o.e.d.z.ZenDiscovery     ] [rqnr5CF] failed to connect to master [{AhMmXxh}{AhMmXxhBRTGvK0DyD-CMuQ}{b2m73mOjQyi9xug0abOK9w}{node1 hostname}{node1 ip:9300}], retrying...
org.elasticsearch.transport.ConnectTransportException: [AhMmXxh][node1 ip:9300] connect_timeout[30s]
        at org.elasticsearch.transport.netty4.Netty4Transport.connectToChannels(Netty4Transport.java:361) ~[?:?]
        at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:548) ~[elasticsearch-[version].jar:[version]]
        at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:472) ~[elasticsearch-[version].jar:[version]]
        at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:332) ~[elasticsearch-[version].jar:[version]]
        at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:319) ~[elasticsearch-[version].jar:[version]]
        at org.elasticsearch.discovery.zen.ZenDiscovery.joinElectedMaster(ZenDiscovery.java:459) [elasticsearch-[version].jar:[version]]
        at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:411) [elasticsearch-[version].jar:[version]]
        at org.elasticsearch.discovery.zen.ZenDiscovery.access$4100(ZenDiscovery.java:83) [elasticsearch-[version].jar:[version]]
        at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1188) [elasticsearch-[version].jar:[version]]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-[version].jar:[version]]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: {node1 hostname}/{node1 ip}.92:9300
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:267) ~[?:?]
        at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) ~[?:?]
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127) ~[?:?]
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) ~[?:?]
        ... 1 more

However, both nodes are reachable from each other using telnet in port 9300. So nodes are reachable but ES throws the exception.

Can anyone tell me the reason why this is happening? Thanks!

Christian_Dahlqvist · August 19, 2021, 7:40pm

Did you wipe the data directory of one of the nodes? I believe the cluster state contains the cluster name so changing it in the config is not necessariloy sufficient.

Elango_P · August 24, 2021, 6:51am

Thank you Christian for the reply, we also have few more clarifications,

1."cluster state contains the cluster name".
How can we check and confirm this inside data folder ? do we have any steps to check this.? any curl command is there to check it ?

2." Did you wipe the data directory of one of the nodes?"
which node's data folder we have to clear ? node2 is not able to connect to node1's cluster, so clearing node2 would be fine or do we have to clear both the node1, node2 as well ?

3.Once we cleared the data folder, how this data folder will be generated back ? will this automatically be re-generated on restart of elastic search services ? if not, do we have to manually copy the node1 data folder into another node (node2) will help us to fix this problem here?

Thanks In advance..!

Elango_P · August 26, 2021, 4:20am

Christian,
any updates on this query would make us to proceed further.
Kindly do the needful.

Thanks,

Christian_Dahlqvist · August 26, 2021, 5:26am

I believe any new node that is added to an existing cluster must be empty and not have been part of a cluster ster before. The data in the data path will be generated when the node joins the cluster. If the node has data that you wish to keep you should create a snapshot with this using the snapshot API so you later can import this to the new cluster.

Elango_P · September 3, 2021, 4:56am

thanks Christian for the update.
We will try it out and let you know incase of any further clarifiactions.

system · October 1, 2021, 4:56am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Connection timeout to master for new nodes Elasticsearch	11	2898	December 7, 2018
Elasticsearch cluster: node not able to connect to cluster Elasticsearch	1	847	July 5, 2017
Cannot join nodes to master Elasticsearch	4	596	February 12, 2020
Timeout/Connection error Elasticsearch	3	1638	July 5, 2017
Nodes are not able to connect to the master Elasticsearch	4	1245	July 6, 2017

Getting ConnectTimeoutException When joining in cluster Even if nodes are reachable

Related topics