Failed to join cluster

aqiank · June 26, 2019, 7:53am

Hi, I removed a node (logs-data-2) and upgraded Elasticsearch to 7.2.0 and encountered this issue.

Master node:

[2019-06-26T07:46:39,403][WARN ][o.e.c.c.Coordinator      ] [logs-master-1] failed to validate incoming join request from node [{logs-data-1}{CbSAF_DDT869p9o4x9zFhg}{Q2YO1CC6R3G07W6BX8az5w}{10.30.2.12}{10.30.2.12:9300}{ml.machine_memory=8352534528, ml.max_open_jobs=20, xpack.installed=true}]
org.elasticsearch.transport.RemoteTransportException: [logs-data-1][10.10.52.12:9300][internal:cluster/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid PA4WbJSeQRup_AuawhMHtw than local cluster uuid aBTiZpcrSb-zBePmGgr2bA, rejecting
	at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:147) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:250) ~[?:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:308) ~[?:?]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:267) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.2.0.jar:7.2.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:?]
	at java.lang.Thread.run(Thread.java:748) [?:?]
[2019-06-26T07:46:39,504][WARN ][o.e.c.c.Coordinator      ] [logs-master-1] failed to validate incoming join request from node [{logs-data-3}{PRn-LgtNS7mnvhwhtMYb-g}{AyIGUaaXSJaXrfHjT95tjw}{10.30.2.14}{10.30.2.14:9300}{ml.machine_memory=8339877888, ml.max_open_jobs=20, xpack.installed=true}]

Data node:

[2019-06-26T07:43:56,341][INFO ][o.e.c.c.JoinHelper       ] [logs-data-1] failed to join {logs-master-1}{l02tG5PlQWCGH9ZiO6R8fA}{pkf7Q4u6TCCNdDrbRmYI0Q}{10.30.2.15}{10.30.2.15:9300}{ml.machine_memory=2065694720, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={logs-data-1}{CbSAF_DDT869p9o4x9zFhg}{Q2YO1CC6R3G07W6BX8az5w}{10.30.2.12}{10.30.2.12:9300}{ml.machine_memory=8352534528, xpack.installed=true, ml.max_open_jobs=20}, optionalJoin=Optional.empty}
org.elasticsearch.transport.RemoteTransportException: [logs-master-1][10.30.2.15:9300][internal:cluster/coordination/join]
Caused by: java.lang.IllegalStateException: failure when sending a validation request to node
	at org.elasticsearch.cluster.coordination.Coordinator$3.onFailure(Coordinator.java:500) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.cluster.coordination.JoinHelper$5.handleException(JoinHelper.java:359) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1111) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.transport.InboundHandler.lambda$handleException$2(InboundHandler.java:246) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688) ~[elasticsearch-7.2.0.jar:7.2.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:1.8.0_212]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:1.8.0_212]
	at java.lang.Thread.run(Thread.java:835) [?:1.8.0_212]
Caused by: org.elasticsearch.transport.RemoteTransportException: [logs-data-1][10.30.2.12:9300][internal:cluster/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid PA4WbJSeQRup_AuawhMHtw than local cluster uuid aBTiZpcrSb-zBePmGgr2bA, rejecting
	at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:147) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:250) ~[?:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:308) ~[?:?]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:267) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) ~[elasticsearch-7.2.0.jar:7.2.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.2.0.jar:7.2.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_212]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_212]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_212]

Is there anyway I can fix this state? I assumed that the data nodes would just join again without a problem as it was operating fine when I removed one of the data nodes. Thanks!

DavidTurner · June 26, 2019, 10:26am

This node is trying to join a different cluster from the one that it previously belonged, which indicates that you've got something misconfigured somewhere.

aqiank · June 26, 2019, 10:57am

I have decided to delete all data and restore the indices from a snapshot.

However, If I deleted a data node and upgraded all Elasticsearch version from 7.1.1 to 7.2.0, is that enough to make it have a different cluster state? It was fully functional before I restarted them and I didn't touch the configurations at all. Which step did I miss? Thanks!

DavidTurner · June 26, 2019, 11:05am

No, upgrading a cluster doesn't cause this to happen. The cluster UUID is stored in the data folder, so as long as you use the same data folders after the upgrade you should have the same cluster UUID.

aqiank · June 27, 2019, 2:03am

Strange. I guess it has something to do with when I deleted a data node. Thanks anyway!

system · July 25, 2019, 2:03am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Failed to validate incoming join request from node Elasticsearch	1	1089	May 16, 2021
Unable to join to my new cluster Elasticsearch	2	351	August 27, 2020
ES 7.2.0 Master fails to join cluster Elasticsearch	3	2865	August 19, 2019
Data node using different cluster id Elasticsearch	6	1676	September 12, 2020
Nodes fail to join cluster after full cluster restart (cluster uuid mismatch?) Elasticsearch	11	6424	December 23, 2019

Failed to join cluster

Related topics