Restored data to fresh cluster incorrectly and receiving "master not discovered or elected yet"

ronenf · August 11, 2020, 6:33pm

Hi

I had a 3 nodes cluster running on on Kubernetes. There were space issues with the volumes so I was forced to provision new volumes. Before I read this article I decided to make a copy of the data directory, I created a new 3 node cluster and restored the data directory. I do of course realise now that this was not the correct way to do this but I am hoping it is not too late to recover my data.

This is the error I receive now

{"type": "server", "timestamp": "2020-08-11T17:55:53,439Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "k8s-logs", "node.name": "es-cluster-2", "message": "master not discovered or elected yet, an election requires at least 3 nodes with ids from [61_bqJv4Q1CqUYnXR4SliQ, DxDiXlTCQw2r86hvOgFZTA, CH9ToyI_Siqmu41a8LdecQ, aowKPk47SMeK0k9nph6yoA, 3eWboZMGSoOmM0UA49SsJA], have discovered [{es-cluster-2}{61_bqJv4Q1CqUYnXR4SliQ}{QOai1LR8Rb6n-0umaedE7g}{10.10.2.236}{10.10.2.236:9300}{dilmrt}{ml.machine_memory=2799996928, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}, {es-cluster-1}{3eWboZMGSoOmM0UA49SsJA}{Bgw_mExBQQiPdM8Oo1g1Hw}{10.10.1.98}{10.10.1.98:9300}{dilmrt}{ml.machine_memory=2799996928, ml.max_open_jobs=20, xpack.installed=true, transform.node=true}, {es-cluster-0}{CH9ToyI_Siqmu41a8LdecQ}{Y8L2EahpTf6tCYHRBe_czQ}{10.10.3.64}{10.10.3.64:9300}{dilmrt}{ml.machine_memory=2799996928, ml.max_open_jobs=20, xpack.installed=true, transform.node=true}] which is a quorum; discovery will continue using [10.10.3.64:9300, 10.10.1.98:9300] from hosts providers and [{es-cluster-2}{61_bqJv4Q1CqUYnXR4SliQ}{QOai1LR8Rb6n-0umaedE7g}{10.10.2.236}{10.10.2.236:9300}{dilmrt}{ml.machine_memory=2799996928, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}] from last-known cluster state; node term 2923, last-accepted version 146555 in term 2923" }

I read other posts about "master not discovered or elected yet" but I see the difference between the errors in those posts and mine is that the nodes ids are being discovered.

Is there anything I can do to maybe force a master to be elected or reset the election process. I still have the data directories so can you advise please on any other way I can restore my data.

Thanks
Ronen

DavidTurner · August 11, 2020, 8:35pm

You seemed to have had a 5-node cluster:

an election requires at least 3 nodes with ids from [61_bqJv4Q1CqUYnXR4SliQ, DxDiXlTCQw2r86hvOgFZTA, CH9ToyI_Siqmu41a8LdecQ, aowKPk47SMeK0k9nph6yoA, 3eWboZMGSoOmM0UA49SsJA]

It looks like you've restored at least 3 of them; I imagine they're all logging similar-looking messages but they will be subtly different and the differences are important. Can you share all of these messages?

ronenf · August 12, 2020, 7:10am

Thanks for the response. There wasnt at any point specifically a 5 nodes cluster, especially since there were only 3 persistent volumes so im not sure how it thinks there were 5.

Please see links to logs for each node. I have enabled trace logging. The top line of each file is the main error.

es-cluster-0:

es-cluster-1:

es-cluster-2:

Thanks
Ronen

DavidTurner · August 12, 2020, 7:34am

Thanks, that's helpful. Here's the problem:

an election requires at least 3 nodes with ids from ...
[                        DxDiXlTCQw2r86hvOgFZTA, CH9ToyI_Siqmu41a8LdecQ, aowKPk47SMeK0k9nph6yoA, DDm_M-p1QzStgEBUq4poAg, 3eWboZMGSoOmM0UA49SsJA]
[                        DxDiXlTCQw2r86hvOgFZTA, CH9ToyI_Siqmu41a8LdecQ, aowKPk47SMeK0k9nph6yoA, DDm_M-p1QzStgEBUq4poAg, 3eWboZMGSoOmM0UA49SsJA]
[61_bqJv4Q1CqUYnXR4SliQ, DxDiXlTCQw2r86hvOgFZTA, CH9ToyI_Siqmu41a8LdecQ, aowKPk47SMeK0k9nph6yoA,                         3eWboZMGSoOmM0UA49SsJA]

There are actually 6 different node IDs in play, so for a majority you need at least 4 of them to be present. Technically you only need 3 from each subset of 5 mentioned above, but in practice this means the same thing: you're missing a node, without which Elasticsearch cannot reconstruct the cluster state.

These nodes were, at some point in the past, all present in this cluster at the same time. No idea how, sorry, but Elasticsearch doesn't invent these node IDs freely so the only explanation is that you had more nodes than you do now.

ronenf · August 12, 2020, 7:49am

Ok thanks I understand now. I must have done something wrong with the backup/restore.

Now the question is; is there any way to recover the data?

DavidTurner · August 12, 2020, 8:16am

I would try the restore again in the hope that wherever these extra nodes came from it happened after you took the backup. Make sure you shut everything down, restore the data paths of all the nodes to their new locations, and only then start things up.

If these extra nodes are present in the backup then I'm sorry to say that the backup doesn't include the latest cluster state so there's no safe way to recover the cluster.

ronenf · August 12, 2020, 8:28am

Thanks for the help David! I appreciate the quick responses!

system · September 9, 2020, 8:28am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
An election requires a node with id[-zQWNDqiRVe_EHGtFKi6Zw], have discovered [...] which is not a quorum Elasticsearch	4	433	August 6, 2020
Getting “master not discovered or elected yet” causing cluster not up in version 7.9.1 Elasticsearch	21	4174	November 7, 2020
ElasticSearch not able to discover Master nodes Elastic Stack	3	1687	November 4, 2022
Getting error like master not discovered or elected yet, an election requires at least 2 nodes Elasticsearch	5	1945	May 12, 2020
Master not discovered or elected yet, an election requires 2 nodes with ids Elasticsearch	3	667	July 22, 2020

Restored data to fresh cluster incorrectly and receiving "master not discovered or elected yet"

Related topics