We have deployed an Elasticsearch cluster consisting of 6 nodes with the following roles:
node-1 master-eligible,data
node-2 master-eligible,data
node-3 master-eligible,data
node-4 data
node-5 data
node-6 data
Due to a computer room malfunction, both node-1 and node-2 were disconnected simultaneously, causing the cluster to become unavailable。
I know that when more than half of the primary qualified nodes in the cluster are disconnected, it may cause the cluster to become unavailable. In order to enable the cluster to start up normally, I would like to add two new machines to the cluster: node-7 and node-8 to replace Node-1 and Node-2, The configuration is as follows:.
cluster.initial_master_nodes: ["node-1:9324", "node-2:9324", "node-3:9324"]
discovery.zen.ping.unicast.hosts: ["node-1:9324", "node-2:9324", "node-3:9324"]
discovery.zen.minimum_master_nodes: 2
However, it was unexpected as all nodes were reporting errors, and node-3 kept reporting the following errors, seemingly waiting for responses from Node-1 and Node-2.
[2024-01-25T18:35:03,496][WARN ][o.e.c.c.ClusterFormationFailureHelper] [10.182.14.236] master not discovered or elected yet, an election requires at least 2 nodes with ids from [_lK0MyMIThi09EZu4JtK-g, XXgGV5t3RV2URLlzW8j6KA, LBxstylQSqGYkElkPy7yhQ], have discovered [{node-3}{_lK0MyMIThi09EZu4JtK-g}{AfZEfqWDQMWxLfTLLUVm_A}{node-3}{node-3:9324}{dilmrt}{ml.machine_memory=67383320576, xpack.installed=true, zone=tc, transform.node=true, ml.max_open_jobs=20}] which is not a quorum; discovery will continue using [node-1:9324, node-2:9324] from hosts providers and [{node-3}{_lK0MyMIThi09EZu4JtK-g}{AfZEfqWDQMWxLfTLLUVm_A}{node-3}{node-3:9324}{dilmrt}{ml.machine_memory=67383320576, xpack.installed=true, zone=tc, transform.node=true, ml.max_open_jobs=20}, {node-2}{LBxstylQSqGYkElkPy7yhQ}{dSthdNVFRemt3yDHmDjPIg}{node-2}{node-2:9324}{dilmrt}{ml.machine_memory=67383324672, ml.max_open_jobs=20, xpack.installed=true, zone=dbl, transform.node=true}] from last-known cluster state; node term 2, last-accepted version 37 in term 2
Questions:
-
Why did the new node configuration not take effect ?
-
How can I quickly recover the cluster in this situation?
Someone help,thanks!