I had an ES cluster of several nodes (let's say es1, es2, ...), and I have a disk snapshot from a certain point in the past, of the disk of a single ES node (i.e. es1).
I'd like to dump all the data from this es1's backup.
I can start a Linux instance (say, esX) from this backup, but the ES instance on esX won't start because it once belonged to a ES cluster.
I don't have backups for es2, es3, ... so I can't start all nodes to build another cluster.
How can I save data from this situation?
--
My guess is 1) maybe there's some way to force start the ES instance by skipping master discovery or election processes or the likes, or 2) maybe there's some way to dump data from /var/lib/elasticsearch directly, without starting an ES instance itself.
But I can't find any way to do either of these.
What use will it be if you salvage the data from this single node? Elasticsearch doesn't store everything on every node, so whatever you salvage will be very incomplete and you likely won't even be able to tell what's missing.
The only reasonable way forward is to start again and replay your data from its original source into a new cluster. There's no value in filesystem-level backups of Elasticsearch nodes. Quoting the docs:
You cannot back up an Elasticsearch cluster by simply copying the data directories of all of its nodes. [...] The only reliable way to back up a cluster is by using the snapshot and restore functionality.
Although you have a disk snapshot rather than a simple copy, this statement is still fundamentally true.
Resiliency has been improved in. 7.x, which means more checks and controls. This leaves less room for hacking solutions compared to earlier versions. If David can not suggest a solution I would bet such may not exist.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.