Failed Node Join in Elasticsearch Cluster "jaeger-es" due to Cluster UUID Mismatch

Srinivas_Rayudu · October 27, 2023, 8:11am

Hello,

I am encountering an issue where one of the data nodes in my Elasticsearch cluster is unable to join the cluster. The cluster name is "jaeger-es" and the problematic node is "jaeger-es-data-0". I found an error in the logs which indicates that the node join request failed.

Here is a simplified explanation of the log message:

The data node "jaeger-es-data-0" attempted to join the cluster via the master node "jaeger-es-master-1". However, the operation failed because of a mismatch in the Cluster UUID between the local node and the cluster state. Specifically, the UUID of the cluster state was different than that of the local node, leading to a rejection of the join request.

Here is the root cause identified from the stack trace:

Error: "join validation on cluster state with a different cluster uuid igxZKLLiSzOxtupkdCzcOQ than local cluster uuid ijwlAUY5SxKJ92yNxIf5Jw, rejecting"

I would appreciate it if you could assist me in resolving this issue. Please let me know the necessary steps to fix the Cluster UUID mismatch and any other related configurations that may need to be updated.

Thank you for your time and assistance.

Srinivas_Rayudu · October 27, 2023, 8:14am

Issue with Cluster Node Connectivity and Data Retrieval in Elasticsearch Cluster

I am currently experiencing issues with my Elasticsearch cluster, which consists of 3 master nodes, 6 data nodes, and 2 client nodes. I've noticed that only one client pod and one master pod are running as expected, while the rest seem to be facing connectivity issues. I also want to retrieve old data stored in the cluster.

{"type": "server", "timestamp": "2023-10-27T08:11:35,736Z", "level": "WARN", "component": "o.e.c.c.Coordinator", "cluster.name": "jaeger-es", "node.name": "jaeger-es-master-2", "message": "failed to validate incoming join request from node [{jaeger-es-data-0}{u4PG-FUSRDiHkc5VxCgSfQ}{rwiGfHmuRg-0adz2t7DRfA}{10.42.10.32}{10.42.10.32:9300}{cdhistw}{xpack.installed=true, transform.node=true}]", "cluster.uuid": "irQoA6X5S2W_-RzSC--SqQ", "node.id": "c-KOM5BaRrKLtzyweQ0vXg" ,

"stacktrace": ["org.elasticsearch.transport.RemoteTransportException: [jaeger-es-data-0][10.42.10.32:9300][internal:cluster/coordination/join/validate]",

"Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid irQoA6X5S2W_-RzSC--SqQ than local cluster uuid ijwlAUY5SxKJ92yNxIf5Jw, rejecting",

"at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$5(JoinHelper.java:164) ~[elasticsearch-7.10.0.jar:7.10.0]",

"at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257) ~[?:?]",

"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.10.0.jar:7.10.0]",

"at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) ~[?:?]",

"at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:72) ~[elasticsearch-7.10.0.jar:7.10.0]",

"at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:305) ~[elasticsearch-7.10.0.jar:7.10.0]",

"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) ~[elasticsearch-7.10.0.jar:7.10.0]",

"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.10.0.jar:7.10.0]",

"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",

"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",

"at java.lang.Thread.run(Thread.java:832) [?:?]"] }

Srinivas_Rayudu · October 27, 2023, 8:16am

see probe is falling due uuid. Can someone please help me with this. i willing to take call or screenshare.

Srinivas_Rayudu · October 27, 2023, 8:18am

I want to make all the pods are up and running and able to communicate each other and there is no network issue from the cluster side because in other namespace same application is working very fine but all I wanted is all the pods are up and running and I wanted to see the old data this is highly critical data we have.

Srinivas_Rayudu · October 27, 2023, 8:21am

Current Cluster status

only able to see 2 nodes.

As part of troubleshooting I have a completely removed cluster and added again and restarted full cluster still didn't worked since so long I am working on it but I could not able to figure out what the problem is I request kindly someone please help me

Christian_Dahlqvist · October 27, 2023, 8:30am

What is the configuration of master and data nodes in the cluster?

It sounds like you are running on k8s. Do all master and data nodes have persistent storage? Given that you get new UUIDs generated when restarting I suspect this may not be the case and that your cluster is also not correctly configured. If so, you may need to restore the data from a recent snapshot.

Srinivas_Rayudu · October 27, 2023, 8:44am

I don't have any snapshot created I have created highly available cluster using Helm ES 7.10.0 with PV

My elastic cluster worked fine for more than 40 days on daily basis it used to write 10 GB data per index unfortunately for some reason it is not working now

Srinivas_Rayudu · October 27, 2023, 8:47am

I have a huge critical data at this point of time I haven't took any snapshot of the previous cluster that's why I am very bothered that data is very important for us

Christian_Dahlqvist · October 27, 2023, 8:47am

I would recommend sharing the full configuration and not just screenshots of part of it. I will have to leave it for someone with more k8s experience to help out though.

Srinivas_Rayudu · October 27, 2023, 8:51am

tried to upload zip file but it is not allowing to send you.

Srinivas_Rayudu · October 27, 2023, 8:51am

Please do needful thanks

Christian_Dahlqvist · October 27, 2023, 9:06am

This is a community forum, not a support portal. Everyone here is volunteering their time and effort, so there is no SLA or even guarantee even to get an answer.

I saw that some important pieces of information were left out of the initial posts and pointed this out so others may be able to help more easily, but troublehooting k8s and helm charts is not my speciality, which is why I will need to leave that for someone else.

Srinivas_Rayudu · October 27, 2023, 9:36am

ohh got it.

system · November 24, 2023, 9:36am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data node’s cluster uuid diffrent from master node's cluster uuid Elasticsearch docker	8	16772	September 25, 2019
Failed to join cluster Elasticsearch	5	9913	July 25, 2019
Node failed to join Elasticsearch	2	4946	August 4, 2020
Master and data nodes have different cluster UUIDs Elasticsearch	5	323	July 19, 2022
Unable to join ES node with different cluster_uuid Elasticsearch	3	817	August 31, 2018

Failed Node Join in Elasticsearch Cluster "jaeger-es" due to Cluster UUID Mismatch

Related topics