Failed Node Join in Elasticsearch Cluster "jaeger-es" due to Cluster UUID Mismatch

Hello,

I am encountering an issue where one of the data nodes in my Elasticsearch cluster is unable to join the cluster. The cluster name is "jaeger-es" and the problematic node is "jaeger-es-data-0". I found an error in the logs which indicates that the node join request failed.

Here is a simplified explanation of the log message:

The data node "jaeger-es-data-0" attempted to join the cluster via the master node "jaeger-es-master-1". However, the operation failed because of a mismatch in the Cluster UUID between the local node and the cluster state. Specifically, the UUID of the cluster state was different than that of the local node, leading to a rejection of the join request.

Here is the root cause identified from the stack trace:

  • Error: "join validation on cluster state with a different cluster uuid igxZKLLiSzOxtupkdCzcOQ than local cluster uuid ijwlAUY5SxKJ92yNxIf5Jw, rejecting"

I would appreciate it if you could assist me in resolving this issue. Please let me know the necessary steps to fix the Cluster UUID mismatch and any other related configurations that may need to be updated.

Thank you for your time and assistance.

Issue with Cluster Node Connectivity and Data Retrieval in Elasticsearch Cluster

I am currently experiencing issues with my Elasticsearch cluster, which consists of 3 master nodes, 6 data nodes, and 2 client nodes. I've noticed that only one client pod and one master pod are running as expected, while the rest seem to be facing connectivity issues. I also want to retrieve old data stored in the cluster.

{"type": "server", "timestamp": "2023-10-27T08:11:35,736Z", "level": "WARN", "component": "o.e.c.c.Coordinator", "cluster.name": "jaeger-es", "node.name": "jaeger-es-master-2", "message": "failed to validate incoming join request from node [{jaeger-es-data-0}{u4PG-FUSRDiHkc5VxCgSfQ}{rwiGfHmuRg-0adz2t7DRfA}{10.42.10.32}{10.42.10.32:9300}{cdhistw}{xpack.installed=true, transform.node=true}]", "cluster.uuid": "irQoA6X5S2W_-RzSC--SqQ", "node.id": "c-KOM5BaRrKLtzyweQ0vXg" ,

"stacktrace": ["org.elasticsearch.transport.RemoteTransportException: [jaeger-es-data-0][10.42.10.32:9300][internal:cluster/coordination/join/validate]",

"Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid irQoA6X5S2W_-RzSC--SqQ than local cluster uuid ijwlAUY5SxKJ92yNxIf5Jw, rejecting",

"at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$5(JoinHelper.java:164) ~[elasticsearch-7.10.0.jar:7.10.0]",

"at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257) ~[?:?]",

"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.10.0.jar:7.10.0]",

"at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) ~[?:?]",

"at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:72) ~[elasticsearch-7.10.0.jar:7.10.0]",

"at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:305) ~[elasticsearch-7.10.0.jar:7.10.0]",

"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) ~[elasticsearch-7.10.0.jar:7.10.0]",

"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.10.0.jar:7.10.0]",

"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",

"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",

"at java.lang.Thread.run(Thread.java:832) [?:?]"] }


see probe is falling due uuid. Can someone please help me with this. i willing to take call or screenshare.

I want to make all the pods are up and running and able to communicate each other and there is no network issue from the cluster side because in other namespace same application is working very fine but all I wanted is all the pods are up and running and I wanted to see the old data this is highly critical data we have.

Current Cluster status


only able to see 2 nodes.

As part of troubleshooting I have a completely removed cluster and added again and restarted full cluster still didn't worked since so long I am working on it but I could not able to figure out what the problem is I request kindly someone please help me

What is the configuration of master and data nodes in the cluster?

It sounds like you are running on k8s. Do all master and data nodes have persistent storage? Given that you get new UUIDs generated when restarting I suspect this may not be the case and that your cluster is also not correctly configured. If so, you may need to restore the data from a recent snapshot.

I don't have any snapshot created I have created highly available cluster using Helm ES 7.10.0 with PV



My elastic cluster worked fine for more than 40 days on daily basis it used to write 10 GB data per index unfortunately for some reason it is not working now

I have a huge critical data at this point of time I haven't took any snapshot of the previous cluster that's why I am very bothered that data is very important for us

I would recommend sharing the full configuration and not just screenshots of part of it. I will have to leave it for someone with more k8s experience to help out though.

tried to upload zip file but it is not allowing to send you.

Please do needful thanks

This is a community forum, not a support portal. Everyone here is volunteering their time and effort, so there is no SLA or even guarantee even to get an answer.

I saw that some important pieces of information were left out of the initial posts and pointed this out so others may be able to help more easily, but troublehooting k8s and helm charts is not my speciality, which is why I will need to leave that for someone else.

ohh got it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.