we have an Elasticsearch 7 cluster in production with 3 physical machines.
Each machines hosts 3 nodes:
- 1 node: master-eligible + data
- 2 nodes: data only
Cluster is configured so that replica shards are copied on nodes that are located on different physical machines.
One of the physical machines has just crashed and it is going to be repaired in a few days.
Cluster status is currently "red" as one index had a replication factor to 0. The missing shard was located on one node on the crashed machine.
This index is not important as we can rebuild it from scratch.
I wonder what is the procedure to restart the data nodes on the crashed machine, as data on the other machines are still updated during the crashed machine downtime :
- should we remove the red index before restarting the data node on the crashed machine so that the cluster status become green again?
- is it safe to restart these data nodes with outdated data, or should we clear all data on the crashed machine before restarting the nodes? (i.e. when we restart the crashed machine, new nodes will join the cluster)
- what is the right procedure to remove data for a node? Is it just to remove the folder defined by "path.data" attribute in config/elasticsearch.yml?