Snapshot Strategy for archival

cgspohn · August 10, 2017, 9:13pm

Hello, for disclamer: I am well aware that snapshots created on a certain version of elasticsearch may only be restored in a cluster of the same version, and that snapshots are not a real archival full-proof solution, but it comes really close for our use case. We want to keep logs for 90 days available in for Kibana users to query, and then keep a year in archival/cold storage just in case. If the needed came up for logs that are 6 months old, we know there would be work involved in bring them back (we would restore with the appropriate cluster when the time comes).

We generate about 10 daily indices today. We after 48 hours or so, we force merge them to reduce the number of segments. Because they are not that big we have 1 shard and 1 replica. On a daily basis we would like to snaphot this daily index, and then delete it from the cluster (to save us disk space). We would keep these snaphots for (365-90)=275 days, and after that we would delete the snaphot since the data would be too old anyways. I have been reading various posts and wondered if scalability would be concern in our case. I wonder if one strategy would work better than others to snapshot all this data. I see two possible options:

a) snaphot each index individually after it was optimized. This would result in 10 * 275 = 2750 snapshots in the cluster, each snaphot would have one index.

b) snaphot all 10 daily indexes together. This would result in 1 * 275 = 275 snaphots in the cluster, each snaphot with 10 indexes in it.

is one way better than the other? Or are both equivalent?

Thank you!

warkolm · August 10, 2017, 9:22pm

From the docs - Snapshot And Restore | Elasticsearch Reference [5.5] | Elastic;

That means that:

A snapshot of an index created in 2.x can be restored to 5.x.

A snapshot of an index created in 1.x can be restored to 2.x.

A snapshot of an index created in 1.x can not be restored to 5.x.

Which is good news

However you would probably settle for the second strategy as it's a little cleaner.

cgspohn · August 10, 2017, 9:27pm

Thanks @warkolm for your response! We are in 5,x, but I am sure life till take us to 6.x soon enough. So we understand the risk. We'll go for the one snapshot a day then. I was wondering how well the cluster would handle 2750 snapshots, thus my question here.

system · September 7, 2017, 9:34pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch snapshot design/solution Elasticsearch	1	603	September 21, 2018
Snapshots for daily indices - best practice? Elasticsearch	4	1652	June 3, 2018
Backup & retention strategy Elasticsearch	2	3042	July 5, 2017
Questions about backup strategy Elasticsearch	4	3520	May 27, 2019
Snapshot Scaling Problems Elasticsearch	11	1341	July 6, 2017

Snapshot Strategy for archival

Related topics