I'm configuring elasticsearch 7.8.0 directly on Ubuntu 18.04.4 on two hosts, but I have an error which I really don't understand when the data node tries to connect to the master, which keeps repeating over and over:
[2020-07-09T19:18:22,026][WARN ][o.e.d.HandshakingTransportAddressConnector] [data1] [connectToRemoteMasterNode[192.168.99.18:9300]] completed handshake with [{master}{wvlA-fklQwGDM-5RGBHA6w}{iKjha53bTNqdgwaMxYdnYg}{10.0.2.15}{10.0.2.15:9300}{imr}{xpack.installed=true, transform.node=false}] but followup connection failed
org.elasticsearch.transport.ConnectTransportException: [master][10.0.2.15:9300] handshake failed. unexpected remote node {data1}{B4osXO2xQVuUlPXF5y09pw}{YKBx1jeDSsORvmGHFt--Jg}{10.0.2.15}{10.0.2.15:9300}{dirt}{xpack.installed=true, transform.node=true}
at org.elasticsearch.transport.TransportService.lambda$connectionValidator$5(TransportService.java:388) ~[elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.action.ActionListener$4.onResponse(ActionListener.java:157) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:475) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:465) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1163) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1163) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:213) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:695) [elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.8.0.jar:7.8.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
master's ip: 192.168.99.18
data node's ip: 192.168.99.19
So these being vagrant boxes they use a default ip 10.0.2.15 for internet access (it's a NAT network created by virtualbox). The aforementioned IPs (192.168.99.18, 192.168.99.19) belong to a bridged network (so they exist on the local network).
I think that the problem might have been that this 10.0.2.15 was the published ip on the node, and the same ip was being used on both vagrant boxes. So I can imagine how that can create a conflict, 'cause under normal circumstances these IPs shouldn't know from one another.
But I'd really liked it if someone told me if that's the correct rationale.
[2020-07-09T20:10:29,394][INFO ][o.e.c.c.JoinHelper ] [data1] failed to join {master}{wvlA-fklQwGDM-5RG
BHA6w}{3iXdfOb-QduJIknMmkUOKA}{192.168.99.18}{192.168.99.18:9300}{imr}{xpack.installed=true, transform.node=f
alse} with JoinRequest{sourceNode={data1}{B4osXO2xQVuUlPXF5y09pw}{zWFYLLMzRI-7uDTWUHpBrQ}{192.168.99.19}{192.
168.99.19:9300}{dirt}{xpack.installed=true, transform.node=true}, minimumTerm=8, optionalJoin=Optional[Join{t
erm=8, lastAcceptedTerm=2, lastAcceptedVersion=30, sourceNode={data1}{B4osXO2xQVuUlPXF5y09pw}{zWFYLLMzRI-7uDT
WUHpBrQ}{192.168.99.19}{192.168.99.19:9300}{dirt}{xpack.installed=true, transform.node=true}, targetNode={mas
ter}{wvlA-fklQwGDM-5RGBHA6w}{3iXdfOb-QduJIknMmkUOKA}{192.168.99.18}{192.168.99.18:9300}{imr}{xpack.installed=
true, transform.node=false}}]}
org.elasticsearch.transport.RemoteTransportException: [master][192.168.99.18:9300][internal:cluster/coordinat
ion/join]
Caused by: java.lang.IllegalStateException: failure when sending a validation request to node
at org.elasticsearch.cluster.coordination.Coordinator$2.onFailure(Coordinator.java:512) ~[elasticsear
ch-7.8.0.jar:7.8.0]
at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandl
er.java:59) ~[elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(Transpo
rtService.java:1173) ~[elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(Transpo
rtService.java:1173) ~[elasticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.transport.InboundHandler.lambda$handleException$2(InboundHandler.java:235) ~[ela
sticsearch-7.8.0.jar:7.8.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext
.java:636) ~[elasticsearch-7.8.0.jar:7.8.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) ~[?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [data1][192.168.99.19:9300][internal:cluster
/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on clus
ter state with a different cluster uuid WknxKhlxQ--LRunX_gdNyg than local cluster uuid tSlRMuhzRPyq5Mi5qrgGAg
, rejecting
So given that the data node was completely new, I deleted the content of /var/lib/elasticsearch and then restarted the data node, and it worked. The data node joined the master node.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.