Master not discovered or elected yet, an election requires a node with id [F-Tn-Q6vQuKE0Fgi5qtUMg] + 503 master not discovered exception

Haoke98 · March 21, 2024, 10:20pm

I have a elasticsearch cluster and it was work well before. But yesterday I accidentally deleted the Master node that has been elected. After this, the another master node cannot be elected by other nodes, at the same time, I got the error(below) on it's logging file:

master not discovered or elected yet, an election requires a node with id [F-Tn-Q6vQuKE0Fgi5qtUMg], have only discovered non-quorum [{OPT__Master2}{ezEK9jukQTGtVW2DP3cSjA}{tHNwll8bTlmlmNes1HJPJg}{OPT__Master2}{192.168.1.30}{192.168.1.30:9801}{m}];

and i wanted fix this by using the /_cluster/voting_config_exclusions API , but I got the another erro like below:

503 master not discovered exception

How can i do ?

DavidTurner · March 22, 2024, 6:38am

See these docs for help:

In particular:

If the logs or the health report indicate that Elasticsearch can’t discover enough nodes to form a quorum, you must address the reasons preventing Elasticsearch from discovering the missing nodes. The missing nodes are needed to reconstruct the cluster metadata. Without the cluster metadata, the data in your cluster is meaningless. The cluster metadata is stored on a subset of the master-eligible nodes in the cluster. If a quorum can’t be discovered, the missing nodes were the ones holding the cluster metadata.

Ensure there are enough nodes running to form a quorum and that every node can communicate with every other node over the network. Elasticsearch will report additional details about network connectivity if the election problems persist for more than a few minutes. If you can’t start enough nodes to form a quorum, start a new cluster and restore data from a recent snapshot.

Christian_Dahlqvist · March 22, 2024, 6:49am

How many master eligible nodes did your cluster initially have? Did you follow the guidelines in the documentation when setting up the cluster? Which version of Elasticsearch are you using?

Haoke98 · March 22, 2024, 6:52am

I can't getting back (recovery) the missing node and the cluster's meta data on it, I have tried many methods to recover it from the hard disk , but failed.
Now I follow what you said and checked the repository directory I saved as you said, and found that there were no snapshots. There were only some indicators and UUID directories in each repo like below:

(base) [root@hm-194 es-repo]# tree -d -L 3
.
├── backups
│   ├── market_subjects
│   │   └── indices
│   ├── patents
│   │   └── indices
│   └── SLRC
│       ├── indices
│       └── tests-yC7oYRGiRgS_uK20H-A7ug
└── long_term_backups
    └── old_market_subjects_data
        └── indices

11 directories

in this , the "market_subjects", "patents" and "old_market_subjects_data" is the repo name that I had created before. but the "SLRC" is the cluster name .
I didn't found any snapshot file , but I had seen the generated snapshot files list by my policy , so is it correct the data under the repo.path like above? or missing something ?

DavidTurner · March 22, 2024, 6:54am

I think this issue answers Christian's questions: you only had two master-eligible nodes. But as the docs say:

A resilient cluster needs three master-eligible nodes so that if one of them fails then the remaining two still form a majority and can hold a successful election.

Unfortunately if you can't bring the node back online you'll need to build a new cluster and restore your data from backups.

Christian_Dahlqvist · March 22, 2024, 6:57am

If you have multiple nodes Elasticsearch snapshots requires a shared filesystem repository, e.g. NFS. Is this what you have configured and mounted on your nodes as the repo path?

Haoke98 · March 22, 2024, 6:57am

I had six data node , two master node before, but yesterday I had deleted one data node and one master node accidentally. The version of the all node is 8.3.3

Haoke98 · March 22, 2024, 7:07am

I was originally thinking about whether there was any way to force the election of my other master, but now the answer is obvious. The other master node does not have the meta information of the cluster, so this is not feasible.

Now I just want you to help me confirm. Can I use the snapshots repo path I showed you above to recover data? Because I saw that even if there is no specific snapshot file, there is a lot of data stored in the indicies folder. Can I use them to restore my indexes on the newly created cluster? If it is possible, how to do it? I looked through the Elastic official documentation and only snapshots can be used to restore it, but after my attempts, I created a new cluster , and this repo.path is set and is empty in the snapshots list of kibana.

Haoke98 · March 22, 2024, 7:13am

Yes, I had configured it on the ./config/elasticsearch.yml on every node ( master / data ).
截屏2024-03-22 15.13.19

Christian_Dahlqvist · March 22, 2024, 7:17am

@DavidTurner This is unfortunately not an uncommon issue even though it is covered by the docs. As far as I know there are no scenarios where bootstrapping a new cluster with 2 master-eligible nodes is recommended. Given that Elasticsearch does add a number of bootstrap checks for what is deemed to be production clusters, would verifying that the number of initial master nodes is not 2 be a good candidate for a new bootstrap check? If we wanted to still allow this for some reason, might it be suitable to then have the user explicitly enable an "unsafe operation mode" through a configuration setting? Maybe it might be useful to also log a warning if there is only a single master eligible node in the cluster that is not a single-node cluster?

Christian_Dahlqvist · March 22, 2024, 7:25am

What type of storage are you using for the repo? Is it an NFS mount?

Haoke98 · March 22, 2024, 7:28am

lvm but, the lvm didn't have any snapshots, so I tried the ext4magic is failed.

(base) [root@hm-194 home]# lsblk
NAME            MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda               8:0    0    60T  0 disk 
├─sda1            8:1    0     1M  0 part 
├─sda2            8:2    0     1G  0 part /boot
└─sda3            8:3    0    60T  0 part 
  ├─centos-root 253:0    0    50G  0 lvm  /
  ├─centos-swap 253:1    0     4G  0 lvm  [SWAP]
  └─centos-home 253:2    0    60T  0 lvm  /home
nvme0n1         259:0    0 931.5G  0 disk /ssd1
nvme1n1         259:1    0 931.5G  0 disk /ssd2
(base) [root@hm-194 home]#

Christian_Dahlqvist · March 22, 2024, 7:32am

That does not look like a shared filesystem. Is it?

For Elasticsearch snapshots to work they need shared storage, NFS storage accessible by all nodes over the network, so that files written by one node can be read from the repo by all other nodes. Having local directories under the same path on different machines does not work.

Haoke98 · March 22, 2024, 7:40am

Oh cry, I really didn’t know this was the case before. Is there really no other way to recover these data?

Haoke98 · March 22, 2024, 7:42am

Anyway, Thank you( @DavidTurner , @Christian_Dahlqvist ) very much for taking the time to help me figure it out.

DavidTurner · March 22, 2024, 7:58am

Bootstrap checks run far too early in startup to know how many nodes there are in the cluster unfortunately, but you're right, it'd be good to have something in this area. I think these days we could add this check to the health report so I opened #106640 to suggest that.

Christian_Dahlqvist · March 22, 2024, 8:20am

At bootstrap time you would know how initial master nodes is configured though, so you could check that this is not exactly 2 nodes. This would likely be enough catch a significant number of cases early on. You could still add it to the health report, but I am afraid this is likely to be overlooked just the docs often are.

DavidTurner · March 22, 2024, 8:41am

Yeah but cluster.initial_master_nodes should only be set the first time the cluster starts, and often folks will get to a 2-node cluster by growing from a one-node cluster, so I don't think this will catch enough cases.

Christian_Dahlqvist · March 22, 2024, 8:51am

There is the elasticsearch-node tool that may be able to help you, but I have myself never had to use it.

Haoke98 · March 22, 2024, 4:51pm

Very appreciate your help, I had fix my problem by the way you suggest.
Although some shards cannot be recovered due to my stupid behavior ( I did not set the number of replicas, snapshot settings and repo configuration correctly ). But it has already saved my life.

In order to facilitate more people who have the same problem as me, I plan to record my steps here:

First, I deactivated all nodes in my catastrophically damaged cluster.
Secondly, I picked the data node that occupied the largest storage space from my damaged cluster, first modified its configuration, changed its role to [master,data] and used the elasticsearch-node tool. Unsafe cluster bootstrapping, so I got a new cluster, but the data and ID of my node have not changed.
In fact, some data can be recovered here, but it is still incomplete. This depends on your copy and chunking.
Immediately afterwards, start the remaining nodes. In fact, more accurately, you should start moving nodes, that is, migrating nodes from the damaged cluster to new nodes. This process requires the detach-cluster operation.
First execute the ./bin/elasticsearch-node detach-cluster command, then modify the discover in the configuration to point to the master node started in the first step, and then start the current node.

I hope everyone can gain something and help from it like I did

Topic		Replies	Views
Master election issue? Elasticsearch	4	374	July 6, 2017
Master not discovered or elected yet, an election requires one or more nodes that have already participated as master-eligible nodes in the cluster but this node was not master-eligible the last time it joined the cluster, have discovered Elasticsearch	10	2038	August 7, 2020
Getting “master not discovered or elected yet” causing cluster not up in version 7.9.1 Elasticsearch	21	4199	November 7, 2020
An election requires a node with id[-zQWNDqiRVe_EHGtFKi6Zw], have discovered [...] which is not a quorum Elasticsearch	4	441	August 6, 2020
Getting "master not discovered or elected yet" causing cluster not up in version 7.1.0 Elasticsearch	28	15260	July 1, 2019

Master not discovered or elected yet, an election requires a node with id [F-Tn-Q6vQuKE0Fgi5qtUMg] + 503 master not discovered exception

Related topics