How to restore the ElasticSearch cluster when more than half of the master-eligible nodes are down simultaneously

We have deployed an Elasticsearch cluster consisting of 6 nodes with the following roles:

node-1    master-eligible,data
node-2    master-eligible,data
node-3    master-eligible,data
node-4    data
node-5    data
node-6    data

Due to a computer room malfunction, both node-1 and node-2 were disconnected simultaneously, causing the cluster to become unavailable。

I know that when more than half of the primary qualified nodes in the cluster are disconnected, it may cause the cluster to become unavailable. In order to enable the cluster to start up normally, I would like to add two new machines to the cluster: node-7 and node-8 to replace Node-1 and Node-2, The configuration is as follows:.

cluster.initial_master_nodes: ["node-1:9324", "node-2:9324", "node-3:9324"]
discovery.zen.ping.unicast.hosts: ["node-1:9324", "node-2:9324", "node-3:9324"]
discovery.zen.minimum_master_nodes: 2

However, it was unexpected as all nodes were reporting errors, and node-3 kept reporting the following errors, seemingly waiting for responses from Node-1 and Node-2.

[2024-01-25T18:35:03,496][WARN ][o.e.c.c.ClusterFormationFailureHelper] [10.182.14.236] master not discovered or elected yet, an election requires at least 2 nodes with ids from [_lK0MyMIThi09EZu4JtK-g, XXgGV5t3RV2URLlzW8j6KA, LBxstylQSqGYkElkPy7yhQ], have discovered [{node-3}{_lK0MyMIThi09EZu4JtK-g}{AfZEfqWDQMWxLfTLLUVm_A}{node-3}{node-3:9324}{dilmrt}{ml.machine_memory=67383320576, xpack.installed=true, zone=tc, transform.node=true, ml.max_open_jobs=20}] which is not a quorum; discovery will continue using [node-1:9324, node-2:9324] from hosts providers and [{node-3}{_lK0MyMIThi09EZu4JtK-g}{AfZEfqWDQMWxLfTLLUVm_A}{node-3}{node-3:9324}{dilmrt}{ml.machine_memory=67383320576, xpack.installed=true, zone=tc, transform.node=true, ml.max_open_jobs=20}, {node-2}{LBxstylQSqGYkElkPy7yhQ}{dSthdNVFRemt3yDHmDjPIg}{node-2}{node-2:9324}{dilmrt}{ml.machine_memory=67383324672, ml.max_open_jobs=20, xpack.installed=true, zone=dbl, transform.node=true}] from last-known cluster state; node term 2, last-accepted version 37 in term 2

Questions:

  1. Why did the new node configuration not take effect ?

  2. How can I quickly recover the cluster in this situation?

Someone help,thanks!

You have 3 master eligible nodes, since 2 of them went offline, your cluster became unavailable and you cannot add new nodes if your cluster is not available.

Unfortunately to solve this you need to bring back at least of the nodes, node-1 or node-2.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.