Snapshot Strategy for archival


(Carmen Spohn) #1

Hello, for disclamer: I am well aware that snapshots created on a certain version of elasticsearch may only be restored in a cluster of the same version, and that snapshots are not a real archival full-proof solution, but it comes really close for our use case. We want to keep logs for 90 days available in for Kibana users to query, and then keep a year in archival/cold storage just in case. If the needed came up for logs that are 6 months old, we know there would be work involved in bring them back (we would restore with the appropriate cluster when the time comes).

We generate about 10 daily indices today. We after 48 hours or so, we force merge them to reduce the number of segments. Because they are not that big we have 1 shard and 1 replica. On a daily basis we would like to snaphot this daily index, and then delete it from the cluster (to save us disk space). We would keep these snaphots for (365-90)=275 days, and after that we would delete the snaphot since the data would be too old anyways. I have been reading various posts and wondered if scalability would be concern in our case. I wonder if one strategy would work better than others to snapshot all this data. I see two possible options:

a) snaphot each index individually after it was optimized. This would result in 10 * 275 = 2750 snapshots in the cluster, each snaphot would have one index.

b) snaphot all 10 daily indexes together. This would result in 1 * 275 = 275 snaphots in the cluster, each snaphot with 10 indexes in it.

is one way better than the other? Or are both equivalent?

Thank you!


(Mark Walkom) #2

From the docs - https://www.elastic.co/guide/en/elasticsearch/reference/5.5/modules-snapshots.html#modules-snapshots;

That means that:

  • A snapshot of an index created in 2.x can be restored to 5.x.
  • A snapshot of an index created in 1.x can be restored to 2.x.
  • A snapshot of an index created in 1.x can not be restored to 5.x.

Which is good news :slight_smile:

However you would probably settle for the second strategy as it's a little cleaner.


(Carmen Spohn) #3

Thanks @warkolm for your response! We are in 5,x, but I am sure life till take us to 6.x soon enough. So we understand the risk. We'll go for the one snapshot a day then. I was wondering how well the cluster would handle 2750 snapshots, thus my question here.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.