Handshake failed, unexpected remote node (IP/port changes)

iquito · January 29, 2018, 1:12am

I am trying to change my elasticsearch.yml config so elasticsearch uses a TCP tunnel between my servers instead of a direct connection. I managed this without problems with MariaDB.

For this I am attempting to just change the following:

server1:

network.host: 192.168.10.160
transport.tcp.port: 9984
discovery.zen.ping.unicast.hosts: ["192.168.10.160:9984", "192.168.10.161:9984"]

server2:

network.host: 192.168.10.161
transport.tcp.port: 9984
discovery.zen.ping.unicast.hosts: ["192.168.10.160:9984", "192.168.10.161:9984"]

to the following on both servers:

network.host: 127.0.0.1
transport.tcp.port: 9984
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9984", "127.0.0.1:9982"]

127.0.0.1:9982 is the TCP tunnel, leading to 127.0.0.1:9984 on the other server. So

127.0.0.1:9984 = always the current server
127.0.0.1:9982 = other server via TCP tunnel

The one server starts up fine:

[2018-01-29T01:50:52,214][INFO ][o.e.t.TransportService   ] [server2] publish_address {127.0.0.1:9784}, bound_addresses {127.0.0.1:9784}
[2018-01-29T01:50:52,333][WARN ][o.e.t.n.Netty4Transport  ] [server2] send message failed [channel: org.elasticsearch.transport.netty4.NettyTcpChannel@28f15758]
java.nio.channels.ClosedChannelException: null
	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-01-29T01:50:55,317][INFO ][o.e.c.s.MasterService    ] [server2] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {server2}{Cl_vBCJrQumobozWnGkyHA}{lPqBzF_USqS4tUfVdyVXOg}{127.0.0.1}{127.0.0.1:9784}
[2018-01-29T01:50:55,321][INFO ][o.e.c.s.ClusterApplierService] [server2] new_master {server2}{Cl_vBCJrQumobozWnGkyHA}{lPqBzF_USqS4tUfVdyVXOg}{127.0.0.1}{127.0.0.1:9784}, reason: apply cluster state (from master [master {server2}{Cl_vBCJrQumobozWnGkyHA}{lPqBzF_USqS4tUfVdyVXOg}{127.0.0.1}{127.0.0.1:9784} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2018-01-29T01:50:55,346][INFO ][o.e.h.n.Netty4HttpServerTransport] [server2] publish_address {127.0.0.1:9683}, bound_addresses {127.0.0.1:9683}
[2018-01-29T01:50:55,347][INFO ][o.e.n.Node               ] [server2] started
[2018-01-29T01:50:56,009][INFO ][o.e.g.GatewayService     ] [server2] recovered [7] indices into cluster_state
[2018-01-29T01:50:56,341][INFO ][o.e.c.r.a.AllocationService] [server2] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[aaproducts_1510450266][0]] ...]).

while the other server floods my elasticsearch logs with:

[2018-01-29T01:51:49,218][WARN ][o.e.d.z.ZenDiscovery     ] [server1] failed to connect to master [{server2}{Cl_vBCJrQumobozWnGkyHA}{lPqBzF_USqS4tUfVdyVXOg}{127.0.0.1}{127.0.0.1:9784}], retrying...
org.elasticsearch.transport.ConnectTransportException: [server2][127.0.0.1:9784] handshake failed. unexpected remote node {server1}{ldNkwVs5RFSebpa3au2FdA}{sV9XkuK2SQ-9ZBc3UmRORA}{127.0.0.1}{127.0.0.1:9784}
	at org.elasticsearch.transport.TransportService.lambda$connectToNode$3(TransportService.java:331) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:514) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:327) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:314) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.discovery.zen.ZenDiscovery.joinElectedMaster(ZenDiscovery.java:515) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:483) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.discovery.zen.ZenDiscovery.access$2500(ZenDiscovery.java:90) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1253) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568) [elasticsearch-6.1.2.jar:6.1.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]

So server2 starts up, it detects itself, and elects itself the master (discovery.zen.minimum_master_nodes is set to 1 on both nodes). server1 starts up, wants to connect to the master which it thinks is its own IP & port combination (which doesn't make sense anyway), notices that IP&port combination is not the other master, is confused that server1 != server2, floods the logs, and does not elect itself the master, so that elasticsearch never properly starts up on that node.

I don't quite understand why this would happen. With the tunnel inbetween, it is maybe harder for Elasticsearch to detect if the connection can be made or not, so on the server which works, this error message is logged:

[2018-01-29T02:32:45,183][WARN ][o.e.t.n.Netty4Transport  ] [server2] send message failed [channel: org.elasticsearch.transport.netty4.NettyTcpChannel@6ed323b7]
java.nio.channels.ClosedChannelException: null
	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]

Yet one of the two servers always goes crazy, doesn't "see itself" and expects the other server there, while the TCP proxy between them seems to work fine.

system · February 26, 2018, 1:12am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Upgrade Elasticsearch 8.2 to 8.x leads to ssl problems Elasticsearch	2	170	February 21, 2024
Handshake failure (TransportClientNodesService: failed to connect to node) Elasticsearch	11	17688	May 19, 2017
After changing port number in Elasticsearch.yml file its get Failed to start Elasticsearch Elasticsearch	24	9156	May 15, 2020
Elastic Elasticsearch load balancer handshake failed Elasticsearch	1	788	April 17, 2018
Remote Transport Exception in 5.1.1 w/JDK 1.8.0_111 Elasticsearch	21	5531	January 18, 2017

Handshake failed, unexpected remote node (IP/port changes)

Related topics