How to restart an Elasticsearch cluster (2 master node, 2 data node, 1 voting-only master-eligible node) after a 1 master node and 1 data node failed due to hardware failure without losing data?

Hi team,
I have an Elasticsearch cluster which is setup across 3 servers through docker. Here is the configuration that I used:

  • server 1: 1 voting-only master
  • server 2: 1 master node, 1 data node, and snapshot
  • server 3: 1 master node, 1 data node
  • The snapshot reposiory is setup through NFS filesystem with master is server 2 and the client is server 3.
    *ELK version: 8.5.0

Unforturnately, there was a power shutdown and after power is restored, I lost all data in server 2 so I can not restart my cluster by using snapshot. Before shutdown, data node in server 2 hold all the primary shards of my indices and server 3 only hold replica shards. Is there anyway that I can restart my cluster without losing data? Or at least, is there a chance that I can get back my data and minimize the quantity of data that might be lost? Any advice would be appreciated greatly. Thank you!

If you have node 1 and 3 still available, they should be able to forma cluster and elect a master. The replica shards should then be promoted to primary shards and you should have not lost any data s long as all indices had a replica shard configured. You should in this case not need to restore from snapshot.

Is that not what you are seeing?

Thank you so much for response, Christian!
I'm sorry that I didn't mention clearly in the post. In fact, due to the power shutdown, all 3 servers were down that means my cluster was down. When the power restored, I found that I lost all the data in server 2 which contains 1 data node and 1 master node. Now, I only have data in the 2 remain server (including 1 voting node, 1 master node and 1 data node). As far as I know, if I restart my cluster carelessly, I can lost my data. Because there are some cases that cluster can not recreate its last state and fails to form again. Which the correct steps that I need to do to restart my cluster? I means which node I should restart first?

No, that should not be the case. You should be fine restarting the nodes and let Elasticsearch recover.

The most important thing is DO NOT SET cluster.initial_master_nodes ON ANY NODE. Make sure it's removed from every elasticsearch.yml file. As long as this setting is missing, Elasticsearch will not do anything destructive with whatever data is left. I expect from your description that the cluster will recover ok, but the worst that can happen is that it won't form a cluster but it will tell about that in the logs.

It doesn't matter what order you restart things, as long as cluster.initial_master_nodes is not set.

Thank you so much for the advice, David!
I removed the setting you mentioned and restarted my cluster successfully. Now I'm waiting for all the shards assigned again. It may take a day for things to work normal as before. But it would be fine for me. Again, many thanks for your support :smiley:

