How to restart an Elasticsearch cluster (2 master node, 2 data node, 1 voting-only master-eligible node) after a 1 master node and 1 data node failed due to hardware failure without losing data?

Hi team,
I have an Elasticsearch cluster which is setup across 3 servers through docker. Here is the configuration that I used:

  • server 1: 1 voting-only master
  • server 2: 1 master node, 1 data node, and snapshot
  • server 3: 1 master node, 1 data node
  • The snapshot reposiory is setup through NFS filesystem with master is server 2 and the client is server 3.
    *ELK version: 8.5.0

Unforturnately, there was a power shutdown and after power is restored, I lost all data in server 2 so I can not restart my cluster by using snapshot. Before shutdown, data node in server 2 hold all the primary shards of my indices and server 3 only hold replica shards. Is there anyway that I can restart my cluster without losing data? Or at least, is there a chance that I can get back my data and minimize the quantity of data that might be lost? Any advice would be appreciated greatly. Thank you!

If you have node 1 and 3 still available, they should be able to forma cluster and elect a master. The replica shards should then be promoted to primary shards and you should have not lost any data s long as all indices had a replica shard configured. You should in this case not need to restore from snapshot.

Is that not what you are seeing?

1 Like

Thank you so much for response, Christian!
I'm sorry that I didn't mention clearly in the post. In fact, due to the power shutdown, all 3 servers were down that means my cluster was down. When the power restored, I found that I lost all the data in server 2 which contains 1 data node and 1 master node. Now, I only have data in the 2 remain server (including 1 voting node, 1 master node and 1 data node). As far as I know, if I restart my cluster carelessly, I can lost my data. Because there are some cases that cluster can not recreate its last state and fails to form again. Which the correct steps that I need to do to restart my cluster? I means which node I should restart first?

No, that should not be the case. You should be fine restarting the nodes and let Elasticsearch recover.

1 Like

The most important thing is DO NOT SET cluster.initial_master_nodes ON ANY NODE. Make sure it's removed from every elasticsearch.yml file. As long as this setting is missing, Elasticsearch will not do anything destructive with whatever data is left. I expect from your description that the cluster will recover ok, but the worst that can happen is that it won't form a cluster but it will tell about that in the logs.

It doesn't matter what order you restart things, as long as cluster.initial_master_nodes is not set.

1 Like

Thank you so much for the advice, David!
I removed the setting you mentioned and restarted my cluster successfully. Now I'm waiting for all the shards assigned again. It may take a day for things to work normal as before. But it would be fine for me. Again, many thanks for your support :smiley:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.