Failed to reconnect to node

danygor · October 1, 2015, 12:00pm

Oftentimes, in my cluster, elasticsearch data nodes are trying to reach the elasticsearch master node using an IP addresses different from the one defined in master node's elasticsearch.yml with the statement:

 network.host:ipaddress

They are actually trying to reach the IP addresses the master node uses for its own iSCSI devices.
Why is this happening?

As stated in elasticsearch.yml comments, I expected network.host to set both network.bind_host and network.publish_host, with network.publish_host being the address other nodes will use to communicate with this node.

Following, the logging of one data node of mine.
Note that 172.17.53.75 is the IP address my master node has configured on the iSCSI subnet.

[2015-10-01 13:31:45,174][WARN ][cluster.service          ] [**DataNode1**] failed to reconnect to node [logstash-**MasterNode**-2287-13610][_XtwgM4ER0OBiPZdCyTS8g][**MasterNode**][inet[/172.17.53.75:9301]]{client=true, data=false}
org.elasticsearch.transport.ConnectTransportException: [logstash-**MasterNode**-2287-13610][inet[/172.17.53.75:9301]] connect_timeout[30s]
	at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:825)
	at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:758)
	at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:731)
	at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:216)
	at org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:584)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /172.17.53.75:9301
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
	at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
	at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
	at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
	at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
	at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
	at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
	... 3 more

Thank you so much,
Daniele

danygor · October 6, 2015, 3:39pm

Well, after some troubleshooting I found out the problem. I didn't know logstash partecipated to the elasticsearch cluster, I realized it seeing how it opened a socket on the unexpected addresses and port 9301. So I found out that the statement

host => "ipaddress"

, in the elasticsearch plugin of the logstash output filter, did not bind logstash to that ip address. To bind it you must use the

bind_host => "ipaddress"

Adding this statement with the expected ip address now I don't see the exception anymore.

Regards,
Daniele

Topic		Replies	Views
[node1] failed to reconnect to node [node1] Elasticsearch	12	2461	July 6, 2017
(ES 0.90.1) Cannot connect to elasticsearch cluster after a node is removed Elasticsearch	10	733	July 6, 2017
Can't connect to node Elasticsearch	2	2621	October 23, 2016
Slave node failed to connect with master (ELasticsearch Clustering) Elasticsearch	1	876	October 23, 2018
Elasticsearch cluster issue Elasticsearch	3	538	January 16, 2018

Failed to reconnect to node

Related topics