Master not discovered, removed nodes have been totally destroyed

Hi,
I have removed half of nodes of cluster, and the removed nodes and machines have been totally destroyed.
the existing cluster log like:

master not discovered or elected yet, an election requires at least nodes with ids from ...

have discovered [] which is not quorum;

discovery will continue using...

After seeking the docs, there is a way to reinstate removed nodes to let the node discovery the master eligible to form a quorum. However, the removed nodes have been permanently removed, can not reinstate, we hope to don't delete the whole path.data to re-build the cluster, we want to keep the indices shard data in path.data, and only wipe the previously saved master eligible states in path.data, voting configuration in the path.data.

Is there any ways to achieve this?

@DavidTurner

Would you have interest for this?

Have a look at this thread which should apply as you no longer have quorum of master nodes.

1 Like

Hi,
After reading the thread, I add many new empty nodes to let the quorum matched, the master is elected by the added new nodes, but the existing
old nodes failed to send join to new master.
log below:

join validation on cluster state with a different cluster uuid ZTfGFra-QrS4FpdLjlQGFQ than local cluster uuid rpjd3TKOTyeqpbCgBVFynQ, rejecting

How to wipe the cluster states info saved in path.data and keep the shard index data and let the old nodes to join and form a cluster ?

our elasticsearch version : 7.3.0

we have no snapshot, and the removed nodes can not come back to cluster. How to let the old nodes with the path.data and form a quorum and cluster again ?

I am not sure that is possible. See the documentation for further details.

If you lost half or more of the master nodes then you have lost the cluster metadata, without which Elasticsearch cannot correctly interpret the data held in all the shards. Since you don't have a snapshot either the only sensible thing to do is rebuild the cluster from data held elsewhere as best as you can.

1 Like

Hi,
After reading this blog

For example, an Elasticsearch 7.0 cluster will not automatically recover if half or more of the master-eligible nodes are permanently lost. It is common to have three master-eligible nodes in a cluster, allowing Elasticsearch to tolerate the loss of one of them without downtime. If two of them are permanently lost then the remaining node cannot safely make any further progress.

Versions of Elasticsearch before 7.0 would quietly allow a cluster to recover from this situation. Users could bring their cluster back online by starting up new, empty, master-eligible nodes to replace any number of lost ones. An automated recovery from the permanent loss of half or more of the master-eligible nodes is unsafe, because none of the remaining nodes are certain to have a copy of the latest cluster state

7.x seems more restricted to re-form the cluster.

according to this:

If neither of these recovery actions is possible then the last resort is the elasticsearch-node unsafe recovery tool. This is a command-line tool that a system administrator can run to perform unsafe actions such as electing a stale master from a minority. By making the steps that can break consistency very explicit, Elasticsearch 7.0 eliminates the risk of unintentionally causing data loss through a series of unsafe operations.

unsafe bootstrap maybe make sense

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.