Cannot connect data node to master node

vinci · July 9, 2020, 7:27pm

Hello,

I'm configuring elasticsearch 7.8.0 directly on Ubuntu 18.04.4 on two hosts, but I have an error which I really don't understand when the data node tries to connect to the master, which keeps repeating over and over:

[2020-07-09T19:18:22,026][WARN ][o.e.d.HandshakingTransportAddressConnector] [data1] [connectToRemoteMasterNode[192.168.99.18:9300]] completed handshake with [{master}{wvlA-fklQwGDM-5RGBHA6w}{iKjha53bTNqdgwaMxYdnYg}{10.0.2.15}{10.0.2.15:9300}{imr}{xpack.installed=true, transform.node=false}] but followup connection failed
org.elasticsearch.transport.ConnectTransportException: [master][10.0.2.15:9300] handshake failed. unexpected remote node {data1}{B4osXO2xQVuUlPXF5y09pw}{YKBx1jeDSsORvmGHFt--Jg}{10.0.2.15}{10.0.2.15:9300}{dirt}{xpack.installed=true, transform.node=true}
	at org.elasticsearch.transport.TransportService.lambda$connectionValidator$5(TransportService.java:388) ~[elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.action.ActionListener$4.onResponse(ActionListener.java:157) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:475) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:465) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1163) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1163) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:213) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:695) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.8.0.jar:7.8.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]

master's ip: 192.168.99.18
data node's ip: 192.168.99.19

master's yml configuration:

cluster.name: balkon
node.name: master
node.master: true
node.data: false
node.ml: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: [_local_, _site_]
http.port: 9200
discovery.seed_hosts: ["127.0.0.1", "192.168.99.19"]
cluster.initial_master_nodes: ["balkon"]

Data node:

node.name: data1
node.master: false
node.data: true
node.ml: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: [_local_, _site_]
http.port: 9200
discovery.seed_hosts: ["127.0.0.1", "192.168.99.18"]
cluster.initial_master_nodes: ["192.168.99.18"]

I'm not sure what I'm doing wrong exactly or what I should be checking for further on.
Any ideas are welcome!

Thank you

vinci · July 9, 2020, 8:24pm

So these being vagrant boxes they use a default ip 10.0.2.15 for internet access (it's a NAT network created by virtualbox). The aforementioned IPs (192.168.99.18, 192.168.99.19) belong to a bridged network (so they exist on the local network).
I think that the problem might have been that this 10.0.2.15 was the published ip on the node, and the same ip was being used on both vagrant boxes. So I can imagine how that can create a conflict, 'cause under normal circumstances these IPs shouldn't know from one another.
But I'd really liked it if someone told me if that's the correct rationale.

Anyway, I changed the network hosts as follows:

network.host: [127.0.0.1, 192.168.99.18]

and:

network.host: [127.0.0.1, 192.168.99.19]

So now curl http://localhost:9200/_cat/nodes?v showed these two different IPs (192.168.99.18 and .19), instead of 10.0.2.15.

But after restart I got another error:

[2020-07-09T20:10:29,394][INFO ][o.e.c.c.JoinHelper       ] [data1] failed to join {master}{wvlA-fklQwGDM-5RG
BHA6w}{3iXdfOb-QduJIknMmkUOKA}{192.168.99.18}{192.168.99.18:9300}{imr}{xpack.installed=true, transform.node=f
alse} with JoinRequest{sourceNode={data1}{B4osXO2xQVuUlPXF5y09pw}{zWFYLLMzRI-7uDTWUHpBrQ}{192.168.99.19}{192.
168.99.19:9300}{dirt}{xpack.installed=true, transform.node=true}, minimumTerm=8, optionalJoin=Optional[Join{t
erm=8, lastAcceptedTerm=2, lastAcceptedVersion=30, sourceNode={data1}{B4osXO2xQVuUlPXF5y09pw}{zWFYLLMzRI-7uDT
WUHpBrQ}{192.168.99.19}{192.168.99.19:9300}{dirt}{xpack.installed=true, transform.node=true}, targetNode={mas
ter}{wvlA-fklQwGDM-5RGBHA6w}{3iXdfOb-QduJIknMmkUOKA}{192.168.99.18}{192.168.99.18:9300}{imr}{xpack.installed=
true, transform.node=false}}]}
org.elasticsearch.transport.RemoteTransportException: [master][192.168.99.18:9300][internal:cluster/coordinat
ion/join]
Caused by: java.lang.IllegalStateException: failure when sending a validation request to node
        at org.elasticsearch.cluster.coordination.Coordinator$2.onFailure(Coordinator.java:512) ~[elasticsear
ch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandl
er.java:59) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(Transpo
rtService.java:1173) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(Transpo
rtService.java:1173) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundHandler.lambda$handleException$2(InboundHandler.java:235) ~[ela
sticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext
.java:636) ~[elasticsearch-7.8.0.jar:7.8.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) ~[?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [data1][192.168.99.19:9300][internal:cluster
/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on clus
ter state with a different cluster uuid WknxKhlxQ--LRunX_gdNyg than local cluster uuid tSlRMuhzRPyq5Mi5qrgGAg
, rejecting

So given that the data node was completely new, I deleted the content of /var/lib/elasticsearch and then restarted the data node, and it worked. The data node joined the master node.

system · August 6, 2020, 8:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch cluster with docker - handshake failed Elasticsearch	4	2731	July 6, 2018
URGENT: Handshake failed. unexpected remote node Elasticsearch	1	416	August 29, 2023
Getting ConnectTransportException for 127.0.0.1:9300 Elasticsearch	3	3291	July 5, 2017
Unable to connect to Master node from Data node in ElasticSearch Elasticsearch	6	1520	July 26, 2019
Slave node failed to connect with master (ELasticsearch Clustering) Elasticsearch	1	876	October 23, 2018

Cannot connect data node to master node

Related topics