I'm looking to understand the behaviour of Elasticsearch snapshot restore under differing circumstances. Try as I might I've been unable to find concrete answers.
I have two identical elastic search clusters. One is the primary and one is the standby.
I regularly take snapshots of the primary cluster and they are stored in a repo on an NFS share.
I restore a snapshot from the primary to the standby.
I restore the snapshot from the primary on the standby again a day later .. or even a month later.
I restore multiple times by closing the indices.
Questions:
When I restore am I only getting the latest changes from the snapshot? It would seem the restore is much quicker the second time.
How common is this practice and are there any edge cases I should know about?
What happens if an index has changed since the last restore is it completely replaced or are only parts of it?
What happens if the standby cluster has indices or objects in indices that no longer exist in the newest snapshot. Are they deleted and rewritten? Are only the files that don't exist in the snapshot removed? Are any objects or indices ever removed in a restore?
In general how are diffs / conflicts handled in the restore process?
This article is helpful and i have had a look over it. It covers alot about repeated snapshot use. it only very lightly touches the topic of restoring over existing indices where it says that they must be closed. So it doesn't answer my questions primarily about restore above.
When I restore over closed indices that are either out of date or that have changed all together into a new cluster what happens to objects or indexes that are in the cluster but not the snapshot?
Is there ever a case where I restore a snapshot over closed indices and I end up with an index that is not representative of the original snapshot?
Mostly I need to understand.. when a cluster and a "standby cluster" diverge in unexpected ways like adding or removing indexes or objects how are all of those diffs resolved.
When you successfully restore an index from a snapshot, the state of the index after the restore is that it contains exactly the segments that were in the snapshot - no more, no less.
If there are segments on disk in the cluster that are part of the (closed) index of the same name that are not part of the named snapshot, they will no longer exist after the restore.
If there are segments on disk in the cluster that are part of the (closed) index of the same name that are part of the named snapshot, they will exist after the restore - without needing to have been copied again.
For this reason, we frequently refer to a snapshot as a "restore point".
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.