How to recover cluster when 2 master nodes have been lost

Hi everyone,
I have the following situation, my cluster is formed by 3 master nodes and 3 data nodes, unfortunately due to a hardware problem we lost 2 master nodes at same time and the left one was not able to assume as master.

And now when I recreated those 2 master nodes, with the same node names and IPs, they are not able to join into the cluster, and I start to get the following message when I start all 3 master nodes:
[masternode-2] received cluster state from {masternode-3}{MZkz2i3qQYCpivdepDtACA}{4naaRVrvQG6_aLXw5i9qsA}{}{}{cdhilmrstw}{ml.machine_memory=20994146304, ml.max_open_jobs=20, xpack.installed=true, transform.node=true} with a different cluster uuid F_KaGI1GR2Kw2oPmS-czmA than local cluster uuid L4iw4uQhQiCStSUtadLjQg, rejecting

masternode-2 is the survival one, masternode-1 and masternode-3 are the ones we lost, it seems that the latest cluster state had masternode-3 as the master one and when it tries to join we get that message.

Any hint on how to recover this cluster without losing the data?

The cluster metadata is stored on a majority of the master nodes, so 2 of the 3, and it was lost when those nodes died. Without this metadata, the data in your cluster is unfortunately not meaningful. You'll need to restore it from a snapshot.

Thanks David, unfortunately we don't have a snapshot (yeah I know we should :sweat_smile: ), do you think I could use the strategy of detaching them from the cluster using elasticsearch-node tool, as you suggested on Recover data after the lost of master [7.1.1] - #5 by DavidTurner ?

You can try but as I said there, it comes with no guarantees, it might not even warn you about any lost data.

1 Like