I am currently trying to figure out correct approach to Backup & Archive strategy for Elasticsearch.
What I would like to achieve is a solution where older indexes that are not required anymore get off-loaded to tape. For example 6m of active data in Elastic and everything else gets sent to tape.
There is Snapshot feature in Elastic, which seems to create a Repository, but it's not exactly index dump that could be off-loaded to tape as single file and restored in user friendly manner. It's more like a service DB for main DB.
As I don't see really an alternative to Snapshot feature, here is what I am thinking:
- Curator runs at the beginning of the month and Snapshots last month's indexes (i.e. index-2018-Oct)
- Backup software collects Snapshot directory and sends it to tape
- Curator runs and deletes now-Xm Index snapshots (thus allowing to keep last X months in Repository for easier and faster recovery)
If data is required from year ago, then data is restored into "new repository" that gets added to ES nodes.
Having 2 repositories - "Active Backup/Snapshot" and "Restore" seems to be a way to get data to tape and free up disk without corrupting anything in a long term.
Is there a better way to approach this? How crazy this sounds?
Thank you for feedback in advance!