How to restart an Elasticsearch cluster (2 master node, 2 data node, 1 voting-only master-eligible node) after a 1 master node and 1 data node failed due to hardware failure without losing data?

ThuyNguyen · October 14, 2023, 10:35am

Hi team,
I have an Elasticsearch cluster which is setup across 3 servers through docker. Here is the configuration that I used:

server 1: 1 voting-only master
server 2: 1 master node, 1 data node, and snapshot
server 3: 1 master node, 1 data node
The snapshot reposiory is setup through NFS filesystem with master is server 2 and the client is server 3.
*ELK version: 8.5.0

Unforturnately, there was a power shutdown and after power is restored, I lost all data in server 2 so I can not restart my cluster by using snapshot. Before shutdown, data node in server 2 hold all the primary shards of my indices and server 3 only hold replica shards. Is there anyway that I can restart my cluster without losing data? Or at least, is there a chance that I can get back my data and minimize the quantity of data that might be lost? Any advice would be appreciated greatly. Thank you!

Christian_Dahlqvist · October 14, 2023, 10:42am

If you have node 1 and 3 still available, they should be able to forma cluster and elect a master. The replica shards should then be promoted to primary shards and you should have not lost any data s long as all indices had a replica shard configured. You should in this case not need to restore from snapshot.

Is that not what you are seeing?

ThuyNguyen · October 14, 2023, 11:54am

Thank you so much for response, Christian!
I'm sorry that I didn't mention clearly in the post. In fact, due to the power shutdown, all 3 servers were down that means my cluster was down. When the power restored, I found that I lost all the data in server 2 which contains 1 data node and 1 master node. Now, I only have data in the 2 remain server (including 1 voting node, 1 master node and 1 data node). As far as I know, if I restart my cluster carelessly, I can lost my data. Because there are some cases that cluster can not recreate its last state and fails to form again. Which the correct steps that I need to do to restart my cluster? I means which node I should restart first?

Christian_Dahlqvist · October 14, 2023, 1:24pm

No, that should not be the case. You should be fine restarting the nodes and let Elasticsearch recover.

DavidTurner · October 14, 2023, 3:23pm

The most important thing is DO NOT SET cluster.initial_master_nodes ON ANY NODE. Make sure it's removed from every elasticsearch.yml file. As long as this setting is missing, Elasticsearch will not do anything destructive with whatever data is left. I expect from your description that the cluster will recover ok, but the worst that can happen is that it won't form a cluster but it will tell about that in the logs.

It doesn't matter what order you restart things, as long as cluster.initial_master_nodes is not set.

ThuyNguyen · October 17, 2023, 3:36am

Thank you so much for the advice, David!
I removed the setting you mentioned and restarted my cluster successfully. Now I'm waiting for all the shards assigned again. It may take a day for things to work normal as before. But it would be fine for me. Again, many thanks for your support

system · November 14, 2023, 3:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic search cluster restart Elasticsearch	8	372	June 4, 2018
Recover a broken 3 node elasticsearch cluster that has only 1 node left Elasticsearch	6	2258	September 12, 2020
ElasticSearch cluster restarting/adding node(s) Elasticsearch	11	1267	April 4, 2018
2 Nodes crashed, how to get last Node up an running Elasticsearch	5	240	July 29, 2022
How to restart data nodes with outdated data? Elasticsearch	3	277	September 20, 2022

How to restart an Elasticsearch cluster (2 master node, 2 data node, 1 voting-only master-eligible node) after a 1 master node and 1 data node failed due to hardware failure without losing data?

Related topics