How to restart data nodes with outdated data?

Sylmarch · August 18, 2022, 8:57am

Hi,

we have an Elasticsearch 7 cluster in production with 3 physical machines.

Each machines hosts 3 nodes:

1 node: master-eligible + data
2 nodes: data only

Cluster is configured so that replica shards are copied on nodes that are located on different physical machines.

One of the physical machines has just crashed and it is going to be repaired in a few days.

Cluster status is currently "red" as one index had a replication factor to 0. The missing shard was located on one node on the crashed machine.
This index is not important as we can rebuild it from scratch.

I wonder what is the procedure to restart the data nodes on the crashed machine, as data on the other machines are still updated during the crashed machine downtime :

should we remove the red index before restarting the data node on the crashed machine so that the cluster status become green again?
is it safe to restart these data nodes with outdated data, or should we clear all data on the crashed machine before restarting the nodes? (i.e. when we restart the crashed machine, new nodes will join the cluster)
what is the right procedure to remove data for a node? Is it just to remove the folder defined by "path.data" attribute in config/elasticsearch.yml?

Thanks!

warkolm · August 22, 2022, 12:32am

If the red index has changed since you lost the node, then yes delete it. Otherwise it should recover.

Otherwise, it does depend a bit on what version you are running.

Sylmarch · August 23, 2022, 7:55am

Now it's OK.

We remove the red index and the cluster become "green" again.

We disconnect the physical machine from the network before repairing it to prevent old nodes with outdated data to join the cluster when the physical machine will start.

After the physical machine is repaired, we connect to it. Elasticsearch services had been already stopped as the network was not available.
We purge data/, logs/ and work/ folders.
We reconnect the physical machine to the network.

Then we restart the node one by one.
For each of them, we monitor /_cat/health?v and /_cat/nodes?v&s=name endpoints to be sure that the cluster status stays "green" and that the node successfully joins the cluster.

We also see that shard reallocation starts after restarting the first data node.

So, you can close this case Mark. Have a nice day.

system · September 20, 2022, 7:55am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to restart a data node that was stopped for a long time Elasticsearch	3	839	December 19, 2018
Restarting a cluster with existing data - Status Red? Elasticsearch	10	1292	July 6, 2017
Correct way to restart a cluster? Elasticsearch	4	477	July 6, 2017
Will a rolling restart lose data? Elasticsearch	4	863	July 6, 2017
ES 0.20.5 stuck in RED after node loss OR how do I configured to avoid problems? Elasticsearch	4	398	July 6, 2017

How to restart data nodes with outdated data?

Related topics