Snapshot strategy

I have been reading about snapshot and restore functionality in Elasticsearch. Coming from relational database background, it took some effort to understand this correctly. Based on my understanding, I am trying to finalize the snapshot strategy for our production Elasticsearch cluster. We do not have time series indices currently (it might change in the near future).
I will be using curator for creating new and deleting older snapshots. The snapshots will be stored on Amazon S3.
Will have 3 repos: hourly, daily and weekly. Will have separate curator actions that will backup to these 3 repos. Cron jobs will be set-up to run the curator actions as follows: hourly snapshots every hour, once in a day snapshots and once in a week snapshots
As for deletion of older snapshots, there will be curator delete actions that will delete older snapshots than 24 hours from the hourly repo, older than 7 days from the daily repo and older than 4 weeks from the weekly repo.
I have the following questions:

  • Will this backup strategy cause any performance issues when creating/deleting snapshots?
  • Having 3 repos vs having a single repo and using different name patterns - which is advisable?
  • From the overall Elasticsearch backup strategy, is there any "good practices" documentation that anybody can refer?
  • Does this backup strategy fall in the "optimal" category or is it an overkill?

Thanks!

It could, depending on how much happens on an ongoing basis, and how long it will take to perform one of the snapshots. Other potential problems include "collisions": If you try to take a snapshot while another (hourly, daily, or weekly) is already underway, it will fail because you cannot take multiple snapshots concurrently. It will not merely wait until the current snapshot completes and then proceed, either.

A single repo with different name patterns would permit re-use of existing segments, otherwise you could end up paying for extra data storage to accommodate.

Not especially, unfortunately. These are mainly determined by comfort level and desires for redundancy. Your current approach appears to be well thought out.

Considering that snapshots are at the segment level, and not the data level, this approach may be a bit overkill. But it's all about what your requirements are, not someone else's arbitrary idea of what you should need.

Thank you for the quick and clear reply. It answered all my questions.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.