Index Backup & Archive to tape strategy

lag · November 20, 2018, 8:13pm

Hello!

I am currently trying to figure out correct approach to Backup & Archive strategy for Elasticsearch.

What I would like to achieve is a solution where older indexes that are not required anymore get off-loaded to tape. For example 6m of active data in Elastic and everything else gets sent to tape.

There is Snapshot feature in Elastic, which seems to create a Repository, but it's not exactly index dump that could be off-loaded to tape as single file and restored in user friendly manner. It's more like a service DB for main DB.

As I don't see really an alternative to Snapshot feature, here is what I am thinking:

Curator runs at the beginning of the month and Snapshots last month's indexes (i.e. index-2018-Oct)
Backup software collects Snapshot directory and sends it to tape
Curator runs and deletes now-Xm Index snapshots (thus allowing to keep last X months in Repository for easier and faster recovery)

If data is required from year ago, then data is restored into "new repository" that gets added to ES nodes.

Having 2 repositories - "Active Backup/Snapshot" and "Restore" seems to be a way to get data to tape and free up disk without corrupting anything in a long term.

Is there a better way to approach this? How crazy this sounds?

Thank you for feedback in advance!

DavidTurner · November 20, 2018, 8:26pm

You cannot really operate on something smaller than a whole repository when copying it elsewhere (e.g. to tape) so would it work better to create a new repository every month? That way you can restore just the months that you need if you want to look at the archives.

lag · November 20, 2018, 9:17pm

I thought about this initially, but considering we store tapes off-site and retrieval will cost money and time I would much rather archive couple months at the time (full backup every month of repository that contain X-months of data) thus making restore process easier. And that would mean one repository that doesn't change should be sufficient.

There are no issue with having "same" repository mounted twice (different paths), right? ("same" repository, but containing different data or at least partially different data)

DavidTurner · November 21, 2018, 8:10am

I see.

I think that's fine, yes.

system · December 19, 2018, 8:11am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Snapshots - restoring an index from a large repository with long term storage (LTO ...) Elasticsearch	4	630	January 18, 2021
Archival strategy of indices Elasticsearch	4	438	July 20, 2018
Backing Up ES Indices Elasticsearch ilm-index-lifecycle-management , snapshot-and-restore	3	183	December 21, 2023
Remove old incidies and back it up Elasticsearch	2	424	August 7, 2018
Snapshot backup Elasticsearch	7	634	April 2, 2020

Index Backup & Archive to tape strategy

Related topics