Cannot connect data node to master node

Hello,

I'm configuring elasticsearch 7.8.0 directly on Ubuntu 18.04.4 on two hosts, but I have an error which I really don't understand when the data node tries to connect to the master, which keeps repeating over and over:

[2020-07-09T19:18:22,026][WARN ][o.e.d.HandshakingTransportAddressConnector] [data1] [connectToRemoteMasterNode[192.168.99.18:9300]] completed handshake with [{master}{wvlA-fklQwGDM-5RGBHA6w}{iKjha53bTNqdgwaMxYdnYg}{10.0.2.15}{10.0.2.15:9300}{imr}{xpack.installed=true, transform.node=false}] but followup connection failed
org.elasticsearch.transport.ConnectTransportException: [master][10.0.2.15:9300] handshake failed. unexpected remote node {data1}{B4osXO2xQVuUlPXF5y09pw}{YKBx1jeDSsORvmGHFt--Jg}{10.0.2.15}{10.0.2.15:9300}{dirt}{xpack.installed=true, transform.node=true}
	at org.elasticsearch.transport.TransportService.lambda$connectionValidator$5(TransportService.java:388) ~[elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.action.ActionListener$4.onResponse(ActionListener.java:157) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:475) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:465) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1163) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1163) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:213) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:695) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.8.0.jar:7.8.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]

master's ip: 192.168.99.18
data node's ip: 192.168.99.19

master's yml configuration:

cluster.name: balkon
node.name: master
node.master: true
node.data: false
node.ml: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: [_local_, _site_]
http.port: 9200
discovery.seed_hosts: ["127.0.0.1", "192.168.99.19"]
cluster.initial_master_nodes: ["balkon"]

Data node:

node.name: data1
node.master: false
node.data: true
node.ml: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: [_local_, _site_]
http.port: 9200
discovery.seed_hosts: ["127.0.0.1", "192.168.99.18"]
cluster.initial_master_nodes: ["192.168.99.18"]

I'm not sure what I'm doing wrong exactly or what I should be checking for further on.
Any ideas are welcome!

Thank you

So these being vagrant boxes they use a default ip 10.0.2.15 for internet access (it's a NAT network created by virtualbox). The aforementioned IPs (192.168.99.18, 192.168.99.19) belong to a bridged network (so they exist on the local network).
I think that the problem might have been that this 10.0.2.15 was the published ip on the node, and the same ip was being used on both vagrant boxes. So I can imagine how that can create a conflict, 'cause under normal circumstances these IPs shouldn't know from one another.
But I'd really liked it if someone told me if that's the correct rationale.

Anyway, I changed the network hosts as follows:

network.host: [127.0.0.1, 192.168.99.18]

and:

network.host: [127.0.0.1, 192.168.99.19]

So now curl http://localhost:9200/_cat/nodes?v showed these two different IPs (192.168.99.18 and .19), instead of 10.0.2.15.

But after restart I got another error:

[2020-07-09T20:10:29,394][INFO ][o.e.c.c.JoinHelper       ] [data1] failed to join {master}{wvlA-fklQwGDM-5RG
BHA6w}{3iXdfOb-QduJIknMmkUOKA}{192.168.99.18}{192.168.99.18:9300}{imr}{xpack.installed=true, transform.node=f
alse} with JoinRequest{sourceNode={data1}{B4osXO2xQVuUlPXF5y09pw}{zWFYLLMzRI-7uDTWUHpBrQ}{192.168.99.19}{192.
168.99.19:9300}{dirt}{xpack.installed=true, transform.node=true}, minimumTerm=8, optionalJoin=Optional[Join{t
erm=8, lastAcceptedTerm=2, lastAcceptedVersion=30, sourceNode={data1}{B4osXO2xQVuUlPXF5y09pw}{zWFYLLMzRI-7uDT
WUHpBrQ}{192.168.99.19}{192.168.99.19:9300}{dirt}{xpack.installed=true, transform.node=true}, targetNode={mas
ter}{wvlA-fklQwGDM-5RGBHA6w}{3iXdfOb-QduJIknMmkUOKA}{192.168.99.18}{192.168.99.18:9300}{imr}{xpack.installed=
true, transform.node=false}}]}
org.elasticsearch.transport.RemoteTransportException: [master][192.168.99.18:9300][internal:cluster/coordinat
ion/join]
Caused by: java.lang.IllegalStateException: failure when sending a validation request to node
        at org.elasticsearch.cluster.coordination.Coordinator$2.onFailure(Coordinator.java:512) ~[elasticsear
ch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandl
er.java:59) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(Transpo
rtService.java:1173) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(Transpo
rtService.java:1173) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundHandler.lambda$handleException$2(InboundHandler.java:235) ~[ela
sticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext
.java:636) ~[elasticsearch-7.8.0.jar:7.8.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) ~[?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [data1][192.168.99.19:9300][internal:cluster
/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on clus
ter state with a different cluster uuid WknxKhlxQ--LRunX_gdNyg than local cluster uuid tSlRMuhzRPyq5Mi5qrgGAg
, rejecting

So given that the data node was completely new, I deleted the content of /var/lib/elasticsearch and then restarted the data node, and it worked. The data node joined the master node.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.