Failed to join cluster

Hi, I removed a node (logs-data-2) and upgraded Elasticsearch to 7.2.0 and encountered this issue.

Master node:

[2019-06-26T07:46:39,403][WARN ][o.e.c.c.Coordinator      ] [logs-master-1] failed to validate incoming join request from node [{logs-data-1}{CbSAF_DDT869p9o4x9zFhg}{Q2YO1CC6R3G07W6BX8az5w}{10.30.2.12}{10.30.2.12:9300}{ml.machine_memory=8352534528, ml.max_open_jobs=20, xpack.installed=true}]
org.elasticsearch.transport.RemoteTransportException: [logs-data-1][10.10.52.12:9300][internal:cluster/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid PA4WbJSeQRup_AuawhMHtw than local cluster uuid aBTiZpcrSb-zBePmGgr2bA, rejecting
	at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:147) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:250) ~[?:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:308) ~[?:?]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:267) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.2.0.jar:7.2.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:?]
	at java.lang.Thread.run(Thread.java:748) [?:?]
[2019-06-26T07:46:39,504][WARN ][o.e.c.c.Coordinator      ] [logs-master-1] failed to validate incoming join request from node [{logs-data-3}{PRn-LgtNS7mnvhwhtMYb-g}{AyIGUaaXSJaXrfHjT95tjw}{10.30.2.14}{10.30.2.14:9300}{ml.machine_memory=8339877888, ml.max_open_jobs=20, xpack.installed=true}]

Data node:

[2019-06-26T07:43:56,341][INFO ][o.e.c.c.JoinHelper       ] [logs-data-1] failed to join {logs-master-1}{l02tG5PlQWCGH9ZiO6R8fA}{pkf7Q4u6TCCNdDrbRmYI0Q}{10.30.2.15}{10.30.2.15:9300}{ml.machine_memory=2065694720, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={logs-data-1}{CbSAF_DDT869p9o4x9zFhg}{Q2YO1CC6R3G07W6BX8az5w}{10.30.2.12}{10.30.2.12:9300}{ml.machine_memory=8352534528, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional.empty}
org.elasticsearch.transport.RemoteTransportException: [logs-master-1][10.30.2.15:9300][internal:cluster/coordination/join]
Caused by: java.lang.IllegalStateException: failure when sending a validation request to node
	at org.elasticsearch.cluster.coordination.Coordinator$3.onFailure(Coordinator.java:500) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.cluster.coordination.JoinHelper$5.handleException(JoinHelper.java:359) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1111) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.transport.InboundHandler.lambda$handleException$2(InboundHandler.java:246) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) ~[elasticsearch-7.2.0.jar:7.2.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:1.8.0_212]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:1.8.0_212]
	at java.lang.Thread.run(Thread.java:835) [?:1.8.0_212]
Caused by: org.elasticsearch.transport.RemoteTransportException: [logs-data-1][10.30.2.12:9300][internal:cluster/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid PA4WbJSeQRup_AuawhMHtw than local cluster uuid aBTiZpcrSb-zBePmGgr2bA, rejecting
	at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:147) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:250) ~[?:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:308) ~[?:?]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:267) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.2.0.jar:7.2.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_212]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_212]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_212]

Is there anyway I can fix this state? I assumed that the data nodes would just join again without a problem as it was operating fine when I removed one of the data nodes. Thanks!

This node is trying to join a different cluster from the one that it previously belonged, which indicates that you've got something misconfigured somewhere.

I have decided to delete all data and restore the indices from a snapshot.

However, If I deleted a data node and upgraded all Elasticsearch version from 7.1.1 to 7.2.0, is that enough to make it have a different cluster state? It was fully functional before I restarted them and I didn't touch the configurations at all. Which step did I miss? Thanks!

No, upgrading a cluster doesn't cause this to happen. The cluster UUID is stored in the data folder, so as long as you use the same data folders after the upgrade you should have the same cluster UUID.

Strange. I guess it has something to do with when I deleted a data node. Thanks anyway!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.