Recover a broken 3 node elasticsearch cluster that has only 1 node left

efan · August 14, 2020, 1:49am

Hi, I have Elasticsearch cluster (7.6.1) that had 3 nodes. 2 of the nodes were accidentally dropped while all 3 servers were running. 1 of the 3 servers survived. I want to save the cluster and its data. How do I add new nodes to it? Current cluster status is,

{
  "error" : {
    "root_cause" : [
      {
        "type" : "master_not_discovered_exception",
        "reason" : null
      }
    ],
    "type" : "master_not_discovered_exception",
    "reason" : null
  },
  "status" : 503
}

Steve_Mushero · August 14, 2020, 2:41am

Let me guess that the surviving node was not a master? Even if it was (and others were), you lost > 50% of your masters so it can't run an election, so the cluster can't recover.

Others may have more ideas, but I don't think there is a recovery path from here, as you can't get an elected master and without that, can't change things, find data, etc. No elected master = no cluster = no data

It would be nice if there were more recovery methods, such as to tell an old master-eligible node to consider itself the single master with its stored state, and let a recovery proceed from there with any existing nodes, etc, but alas, no one has built that yet, as far as I know.

Elasticsearch does a lot to protect your data, but once it breaks, it really breaks and has little in the way of tools or methods to recover even part of lost data (e.g. segment export, meta-data bottom-up rebuild, etc.) Some day (we used to have/build these for old DB/code systems). And maybe enterprise-level support has special tools.

Hope you have snapshot ... and others have ideas.

DavidTurner · August 14, 2020, 6:11am

Yep, snapshots are the answer here, or else build a fresh cluster and index the data from its original source again.

The kinds of low-level tools you are describing are very unsafe and users often misunderstand what their weak guarantees mean for the integrity of the data they claim to recover. No tool can protect you from every kind of disaster or mistake, so you have to take snapshots anyway if you care about your data, but if you have snapshots then there's little value in lower-level "rescue" tooling.

You get better support for sure, but there's no magical secret tool to fix this kind of disaster if you pay enough money. I mean there's snapshots, they're magic, but also free to use

Steve_Mushero · August 14, 2020, 7:32am

Generally agreed, but frankly at some level and if I was running & paying for an enterprise product at scale, I'd want more tools for recovery, even partial for real disasters & situations - at least for loss of masters/voting where data may not really be lost, but it's considered lost (like split brain).

Surely educate people, but also trust that senior people with larger systems and enterprises can judge risk and tooling at various levels of need, etc. Especially as systems get larger and even snapshot recovery times get really long - in may ways ES is more reliable but also more brittle (no support for two-zone cluster issues still irks me).

A lot of energy was poured into these things in the RDBMS world going back decades, 3rd party tools, internal details, disaster recovery systems, etc. for what's mostly a single use case (DBMS), but I feel Elasticsearch is more powerful & flexible, but overly insular in some ways.

Just my overall feel, as this becomes more important as it gets used for more things, for more data, and for more mission-critical systems.

efan · August 14, 2020, 1:35pm

Thank you all for the insight. It really helps. Much appreciated. We don't have snapshots. So we will try to rebuild the cluster. I admit that not being able to recover from the remaining node by making it a new master is disappointing.

Steve_Mushero · August 15, 2020, 2:00am

Snapshots are super easy to setup and work very well, including mostly incremental updates so quite quick - can push to files or S3, etc. so suggest you get them going as soon as you can. Really the nicest backup system there is, in my opinion.

system · September 12, 2020, 2:00am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Recover data after the lost of master [7.1.1] Elasticsearch	6	3600	July 25, 2019
Questions about recovering cluster Elasticsearch	3	356	August 10, 2020
How to enable cluster with 3 master node and lost 2 nodes to work again Elasticsearch	5	319	June 26, 2022
How to recover cluster when 2 master nodes have been lost Elasticsearch	4	1665	May 5, 2021
Recover data from Inactive Cluster Elasticsearch	6	301	August 2, 2021

Recover a broken 3 node elasticsearch cluster that has only 1 node left

Related topics