I have been reading about snapshot and restore functionality in Elasticsearch. Coming from relational database background, it took some effort to understand this correctly. Based on my understanding, I am trying to finalize the snapshot strategy for our production Elasticsearch cluster. We do not have time series indices currently (it might change in the near future).
I will be using curator for creating new and deleting older snapshots. The snapshots will be stored on Amazon S3.
Will have 3 repos: hourly, daily and weekly. Will have separate curator actions that will backup to these 3 repos. Cron jobs will be set-up to run the curator actions as follows: hourly snapshots every hour, once in a day snapshots and once in a week snapshots
As for deletion of older snapshots, there will be curator delete actions that will delete older snapshots than 24 hours from the hourly repo, older than 7 days from the daily repo and older than 4 weeks from the weekly repo.
I have the following questions:
- Will this backup strategy cause any performance issues when creating/deleting snapshots?
- Having 3 repos vs having a single repo and using different name patterns - which is advisable?
- From the overall Elasticsearch backup strategy, is there any "good practices" documentation that anybody can refer?
- Does this backup strategy fall in the "optimal" category or is it an overkill?
Thanks!