How to restore the ElasticSearch cluster when more than half of the master-eligible nodes are down simultaneously

hhalei · January 25, 2024, 11:08am

We have deployed an Elasticsearch cluster consisting of 6 nodes with the following roles：

node-1    master-eligible,data
node-2    master-eligible,data
node-3    master-eligible,data
node-4    data
node-5    data
node-6    data

Due to a computer room malfunction, both node-1 and node-2 were disconnected simultaneously， causing the cluster to become unavailable。

I know that when more than half of the primary qualified nodes in the cluster are disconnected, it may cause the cluster to become unavailable. In order to enable the cluster to start up normally, I would like to add two new machines to the cluster: node-7 and node-8 to replace Node-1 and Node-2, The configuration is as follows:.

cluster.initial_master_nodes: ["node-1:9324", "node-2:9324", "node-3:9324"]
discovery.zen.ping.unicast.hosts: ["node-1:9324", "node-2:9324", "node-3:9324"]
discovery.zen.minimum_master_nodes: 2

However, it was unexpected as all nodes were reporting errors, and node-3 kept reporting the following errors, seemingly waiting for responses from Node-1 and Node-2.

[2024-01-25T18:35:03,496][WARN ][o.e.c.c.ClusterFormationFailureHelper] [10.182.14.236] master not discovered or elected yet, an election requires at least 2 nodes with ids from [_lK0MyMIThi09EZu4JtK-g, XXgGV5t3RV2URLlzW8j6KA, LBxstylQSqGYkElkPy7yhQ], have discovered [{node-3}{_lK0MyMIThi09EZu4JtK-g}{AfZEfqWDQMWxLfTLLUVm_A}{node-3}{node-3:9324}{dilmrt}{ml.machine_memory=67383320576, xpack.installed=true, zone=tc, transform.node=true, ml.max_open_jobs=20}] which is not a quorum; discovery will continue using [node-1:9324, node-2:9324] from hosts providers and [{node-3}{_lK0MyMIThi09EZu4JtK-g}{AfZEfqWDQMWxLfTLLUVm_A}{node-3}{node-3:9324}{dilmrt}{ml.machine_memory=67383320576, xpack.installed=true, zone=tc, transform.node=true, ml.max_open_jobs=20}, {node-2}{LBxstylQSqGYkElkPy7yhQ}{dSthdNVFRemt3yDHmDjPIg}{node-2}{node-2:9324}{dilmrt}{ml.machine_memory=67383324672, ml.max_open_jobs=20, xpack.installed=true, zone=dbl, transform.node=true}] from last-known cluster state; node term 2, last-accepted version 37 in term 2

Questions:

Why did the new node configuration not take effect ？
How can I quickly recover the cluster in this situation？

Someone help，thanks！

leandrojmp · January 25, 2024, 12:28pm

You have 3 master eligible nodes, since 2 of them went offline, your cluster became unavailable and you cannot add new nodes if your cluster is not available.

Unfortunately to solve this you need to bring back at least of the nodes, node-1 or node-2.

system · February 22, 2024, 12:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to form cluster after half of cluster nodes was removed - ES 7.1 Elasticsearch	10	2161	July 26, 2019
Elasticsearch multi node failover Elasticsearch	3	945	September 27, 2017
2 node cluster on Elasticsearch 7.0.1 Elasticsearch	6	1568	August 5, 2019
Elastic Cluster restart with one master node Elasticsearch	10	1077	January 13, 2021
Restoring 2 nodes cluster after master goes down Elasticsearch	1	445	February 27, 2018

How to restore the ElasticSearch cluster when more than half of the master-eligible nodes are down simultaneously

Related topics