Snapshot and restore | S3 and Glacier


(Piyush Goyal) #1

Hi,

In our production environment, we take backup of our ES daily to S3 repository. Now since the process is year long and we have a huge list of snapshot, we decided to move the old snapshots from S3 to Glacier. Following are my questions related to the process:

1.) I presume moving snapshots from S3 to Glacier is equivalent to deleting a snapshot. Am I right?
2.) As per the documentation, when a snapshot is deleted, Elasticsearch deletes all files that are associated with the deleted snapshot and not used by any other snapshots. Actually, I want to understand how this process works. Does a new snapshot only contains a delta of documents changed/deleted/created from the last snapshot? How does deletion of any snapshot has any impact on losing any data and during restore operations?

Any help would be appreciated.

Thanks
Piyush


(Piyush Goyal) #2

In addition to the previous questions, we have noticed that the snapshot which are moved from S3 to glacier using Amazon life cycle configurations and are not deleted permanently from glacier are visible in S3 console and their storage class is marked as Glacier. When the ES tries to create the new snapshot, it tries to read this Glacier marked snapshots also and reports an error with warning:
[2015-07-27 02:19:44,648][WARN ][index.snapshots.blobstore] [es_prod_cortez_node1] failed to read commit point [snapshot-cortezsnapshot_2015may17]
java.io.IOException: Failed to get [snapshot-cortezsnapshot_2015may17]
at org.elasticsearch.common.blobstore.support.AbstractBlobContainer.readBlobFully(AbstractBlobContainer.java:83)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$Context.buildBlobStoreIndexShardSnapshots(BlobStoreIndexShardRepository.java:370)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$SnapshotContext.snapshot(BlobStoreIndexShardRepository.java:420)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.snapshot(BlobStoreIndexShardRepository.java:131)
at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.snapshot(IndexShardSnapshotAndRestoreService.java:86)
at org.elasticsearch.snapshots.SnapshotsService$6.run(SnapshotsService.java:829)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The operation is not valid for the object's storage class (Service: Amazon S3; Status Code: 403; Error Code: InvalidObjectState; Request ID: 2066FA56825CF829), S3 Extended Request ID: OnfdWFKLyXq9SbXb/Ttb7nPFx/Ig97NAIW5XBCaz3JyxClu+zRtND0ZYES1RbRed
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:820)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:439)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:245)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3722)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1137)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1002)
at org.elasticsearch.cloud.aws.blobstore.AbstractS3BlobContainer$1.run(AbstractS3BlobContainer.java:83)
... 3 more

Either ES should ignore the snapshots which are marked as Glacier or it should read the snapshots them.


(Mark Walkom) #3

Glacier is an AWS abstraction so I don't know if ES will be able to tell?
I'll pass it on though and see if it's anything we can support.


(Mark Walkom) #4

I wonder if the problem (in part) is the response times of Glacier? From https://aws.amazon.com/glacier/

To keep costs low, Amazon Glacier is optimized for infrequently accessed data where a retrieval time of several hours is suitable.


(Mark Walkom) #5

I put all of this into here https://github.com/elastic/elasticsearch/issues/12500


(Piyush Goyal) #6

Thanks @warkolm. :blush: . Also can you help me on the questions I had in the very first post on how snapshot thingy works in case of a deletion of snapshot? How does other snapshot preserves the state of the backup data.

Thanks again.
Piyush


(Mark Walkom) #7

Cause it works on a shard level, it'll hold onto any shards it needs to be able to restore any remaining snapshots and just delete the shards it doesn't need.


(Piyush Goyal) #8

Does that mean deletion of any older snapshot actually would not cause loosing of data if we try to restore back from recent snapshots and restoring back from a new snapshot will provide the complete data set? The incremental word mentioned in the documentation is causing a lot of confusion.


(Christian Dahlqvist) #9

It works on the segment level within a shard, and each segment is only copied once even if it is still in use for multiple snapshots. If you remove old snapshots, segments will only be deleted once there are no more snapshots linking to them.


(Mark Walkom) #10

You beat me to it :stuck_out_tongue:


(Piyush Goyal) #11

lol..!! :smiley: another question for sake of my clarity to both of you: So just like ES works with segments where merging and all those processes happen at back end, the same happens for snapshots as well? The first snapshot copies the segment and as the new snapshots are created, the new segments or the delta of segments are merged to the existing segment copy and the new snapshots kind of reference to that existing segment?


(Mark Walkom) #12

Say you have two segments for index A and take a snapshot, they are then merged and another segment is created and you take another snapshot. You will end up with all 4 segments in the snapshot repo as they are all different.

if you have another index with 2 segments and take a snapshot, but by the time you take your next snapshot those segments have not changed, then the second snapshot won't take another copy of those segments.

It doesn't merge the deltas, it just looks at the segments at snapshot time N and if the segments change at snapshot time N+1 it does a wholesale copy.


(Piyush Goyal) #13

Aaha..!! I think I am getting closer. So if I say I would like to delete a snapshot created 3 months back, that would not bother me at all since I believe in past 3 months, ES must have merged many segments together and a near recent snapshot(not exactly a snapshot created yesterday) would provide me those newly created segments. Not to mention that we daily index good amount of data to the same index.

Also, in case if the segments at snapshot time N+1 do not change, and I delete the snapshot at time N, the segments won't get deleted since they are linked by N+1 snapshot. Right?


(Mark Walkom) #14

If you delete old snapshots ES is smart enough to only delete the segments it won't need to be able to restore every other snapshot.

Your second comment is correct.


(Piyush Goyal) #15

wohoo..!! Thanks @warkolm and @Christian_Dahlqvist for all the help. :blush: Can I publish it anywhere? This is I guess very valuable information. Thanks a ton again..! :blush:


(Mark Walkom) #16

Maybe we can massage this into a blog post for posterity :slight_smile:


(Piyush Goyal) #17

@warkolm: Another question which popuped in my mind is there are two ways by which one can delete a snapshot. Either through ES API or by directly deleting through filesystem/AWS console. I guess if I delete a snapshot through ES API, then it would take care of what we discussed in above threads. However, I believe if I directly delete it through filesystem/AWS console, then what we discussed above won't hold true.
Am I right?


(Mark Walkom) #18

If you manually delete anything on the FS that belongs to ES, whether it's a snapshot, segment file, shard directory or anything else;


(Piyush Goyal) #19


(system) #20