I am currently trying to figure out correct approach to Backup & Archive strategy for Elasticsearch.

What I would like to achieve is a solution where older indexes that are not required anymore get off-loaded to tape. For example 6m of active data in Elastic and everything else gets sent to tape.

There is Snapshot feature in Elastic, which seems to create a Repository, but it's not exactly index dump that could be off-loaded to tape as single file and restored in user friendly manner. It's more like a service DB for main DB.

As I don't see really an alternative to Snapshot feature, here is what I am thinking:

  1. Curator runs at the beginning of the month and Snapshots last month's indexes (i.e. index-2018-Oct)
  2. Backup software collects Snapshot directory and sends it to tape
  3. Curator runs and deletes now-Xm Index snapshots (thus allowing to keep last X months in Repository for easier and faster recovery)

If data is required from year ago, then data is restored into "new repository" that gets added to ES nodes.

Having 2 repositories - "Active Backup/Snapshot" and "Restore" seems to be a way to get data to tape and free up disk without corrupting anything in a long term.

Is there a better way to approach this? How crazy this sounds?

Thank you for feedback in advance!

You cannot really operate on something smaller than a whole repository when copying it elsewhere (e.g. to tape) so would it work better to create a new repository every month? That way you can restore just the months that you need if you want to look at the archives.


I thought about this initially, but considering we store tapes off-site and retrieval will cost money and time I would much rather archive couple months at the time (full backup every month of repository that contain X-months of data) thus making restore process easier. And that would mean one repository that doesn't change should be sufficient.

There are no issue with having "same" repository mounted twice (different paths), right? ("same" repository, but containing different data or at least partially different data)

I see.

I think that's fine, yes.

