Failed to reconnect to node


#1

Oftentimes, in my cluster, elasticsearch data nodes are trying to reach the elasticsearch master node using an IP addresses different from the one defined in master node's elasticsearch.yml with the statement:

 network.host:ipaddress

They are actually trying to reach the IP addresses the master node uses for its own iSCSI devices.
Why is this happening?

As stated in elasticsearch.yml comments, I expected network.host to set both network.bind_host and network.publish_host, with network.publish_host being the address other nodes will use to communicate with this node.

Following, the logging of one data node of mine.
Note that 172.17.53.75 is the IP address my master node has configured on the iSCSI subnet.

[2015-10-01 13:31:45,174][WARN ][cluster.service          ] [**DataNode1**] failed to reconnect to node [logstash-**MasterNode**-2287-13610][_XtwgM4ER0OBiPZdCyTS8g][**MasterNode**][inet[/172.17.53.75:9301]]{client=true, data=false}
org.elasticsearch.transport.ConnectTransportException: [logstash-**MasterNode**-2287-13610][inet[/172.17.53.75:9301]] connect_timeout[30s]
	at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:825)
	at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:758)
	at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:731)
	at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:216)
	at org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:584)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /172.17.53.75:9301
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
	at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
	at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
	at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
	at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
	at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
	at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
	... 3 more

Thank you so much,
Daniele


#2

Well, after some troubleshooting I found out the problem. I didn't know logstash partecipated to the elasticsearch cluster, I realized it seeing how it opened a socket on the unexpected addresses and port 9301. So I found out that the statement

host => "ipaddress"

, in the elasticsearch plugin of the logstash output filter, did not bind logstash to that ip address. To bind it you must use the

bind_host => "ipaddress"

Adding this statement with the expected ip address now I don't see the exception anymore.

Regards,
Daniele


(system) #3