Salvage data from a single node that once belonged to a cluster

yk928 · July 18, 2020, 8:58am

I had an ES cluster of several nodes (let's say es1, es2, ...), and I have a disk snapshot from a certain point in the past, of the disk of a single ES node (i.e. es1).
I'd like to dump all the data from this es1's backup.
I can start a Linux instance (say, esX) from this backup, but the ES instance on esX won't start because it once belonged to a ES cluster.
I don't have backups for es2, es3, ... so I can't start all nodes to build another cluster.
How can I save data from this situation?

--

My guess is 1) maybe there's some way to force start the ES instance by skipping master discovery or election processes or the likes, or 2) maybe there's some way to dump data from /var/lib/elasticsearch directly, without starting an ES instance itself.
But I can't find any way to do either of these.

Christian_Dahlqvist · July 18, 2020, 9:02am

Which version are you using?

yk928 · July 18, 2020, 9:06am

{
    "number" : "7.7.0",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "81a1e9eda8e6183f5237786246f6dced26a10eaf",
    "build_date" : "2020-05-12T02:01:37.602180Z",
    "build_snapshot" : false,
    "lucene_version" : "8.5.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  }

DavidTurner · July 18, 2020, 10:00am

What use will it be if you salvage the data from this single node? Elasticsearch doesn't store everything on every node, so whatever you salvage will be very incomplete and you likely won't even be able to tell what's missing.

yk928 · July 18, 2020, 10:31am

That's fine. I just want to recover as much as possible.
Plus, IIRC the number_of_replicas was set large enough so all shards were on es1.

Any help would be appreciated...

DavidTurner · July 18, 2020, 11:44am

The only reasonable way forward is to start again and replay your data from its original source into a new cluster. There's no value in filesystem-level backups of Elasticsearch nodes. Quoting the docs:

You cannot back up an Elasticsearch cluster by simply copying the data directories of all of its nodes. [...] The only reliable way to back up a cluster is by using the snapshot and restore functionality.

Although you have a disk snapshot rather than a simple copy, this statement is still fundamentally true.

yk928 · July 18, 2020, 11:50am

Thanks for your reply.
Isn't there any way at all to get my data back even partially?

DavidTurner · July 18, 2020, 12:07pm

I have no other recommendations, sorry. Logically the data was lost when the cluster failed without any proper snapshots.

yk928 · July 18, 2020, 12:22pm

OK, I understand. Thanks. Will wait for some hack-ish advice hopefully.

Christian_Dahlqvist · July 18, 2020, 1:19pm

Resiliency has been improved in. 7.x, which means more checks and controls. This leaves less room for hacking solutions compared to earlier versions. If David can not suggest a solution I would bet such may not exist.

system · August 15, 2020, 1:19pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Deleted cluster,cant restore index from non-ES snapshot (disk backup) Elasticsearch	9	1403	July 5, 2017
Snapshot a single node Elasticsearch	4	323	July 6, 2017
Elastic search cluster backup Elasticsearch	1	290	September 25, 2018
Recover a broken 3 node elasticsearch cluster that has only 1 node left Elasticsearch	6	2476	September 12, 2020
Replace failing disks on a single node Elasticsearch	4	1424	July 6, 2017

Salvage data from a single node that once belonged to a cluster

Related topics