How do people assert the veracity of their snapshots?

We have over 12TB of data in our ES cluster. Most of that is made up of two really massive indices.

We take snapshots to an S3 repo - incremental every night, full snapshots once a month.

We're starting to wonder how we verify these snapshots. How do other people do this? How can we be sure that they:

  1. will actually restore and
  2. will contain the right indices and
  3. will contain the right number of documents per index (plus or minus a given amount of acceptable data loss)

It feels like to properly test it we'd have to run up a 4 node cluster and restore it. Automating that seems like a pain.

How do other people do it?

Yes, completely restoring a backup is in general the only way to verify beyond any doubt that it can be restored. Furthermore, automating the restore process is a very good way to ensure that there aren't any gotchas that will bite you if ever you truly need to recover from a disaster.

You might not need as powerful a cluster to accept the restore - for instance you won't need to do many searches, can set number_of_replicas: 0 to keep the storage requirements down, and can use the cheapest possible storage since performance shouldn't be too much of an issue.

You may like to know that all the files in a snapshot are verified (by checksum) on the way to and from the repository. If Elasticsearch reports that the restore has succeeded then you can be pretty confident that you have a faithful copy of the indices that you snapshotted.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.