Question on the backup snapshot

Based on the following guide, only the first snapshot is a complete copy of data, and all subsequent snapshots will save the delta between the existing snapshots and the new data.
https://www.elastic.co/guide/en/elasticsearch/guide/current/backing-up-your-cluster.html

I have a couple of questions as below,

  1. If I create a snapshot daily, and there will be 365 snapshots a year later. Does it mean that only the first one is a complete one, and all other 364 are just incremental snapshots, and each is based on previous one? In that case, We can't purge any old snapshot.

  2. Is it possible to forcely create a snapshot with a complete copy of data?

  3. I used /_snapshot/my_backup/_all to get all snapshots. I couldn't tell which one is a snapshot with a complete copy of data from the response. How can I know which one is a complete one? Thanks.

    {
    "snapshots": [
    {
    "snapshot": "snapshot_1",
    "uuid": "MYx9ID8lSuOP_29JnSvnvw",
    "version_id": 6020299,
    "version": "6.2.2",
    "indices": [
    "t-paefhrifl744uhlgosut-kibana",
    "audit-paefhrifl744uhlgosut-201807"
    ],
    "include_global_state": true,
    "state": "SUCCESS",
    "start_time": "2018-07-27T05:43:50.427Z",
    "start_time_in_millis": 1532670230427,
    "end_time": "2018-07-27T05:43:50.647Z",
    "end_time_in_millis": 1532670230647,
    "duration_in_millis": 220,
    "failures": [
    ],
    "shards": {
    "total": 6,
    "failed": 0,
    "successful": 6
    }
    },
    {
    "snapshot": "snapshot_2",
    "uuid": "ux7xpo4tS-m3aXeYmrkcEA",
    "version_id": 6020299,
    "version": "6.2.2",
    "indices": [
    "t-paefhrifl744uhlgosut-kibana",
    "audit-paefhrifl744uhlgosut-201807"
    ],
    "include_global_state": true,
    "state": "SUCCESS",
    "start_time": "2018-07-27T05:45:15.967Z",
    "start_time_in_millis": 1532670315967,
    "end_time": "2018-07-27T05:45:16.023Z",
    "end_time_in_millis": 1532670316023,
    "duration_in_millis": 56,
    "failures": [
    ],
    "shards": {
    "total": 6,
    "failed": 0,
    "successful": 6
    }
    }
    ]
    }

It's not quite that simple, but at a high level that's true.

Per the guide you linked to, you can delete snapshots through the Elasticsearch API. You cannot delete them directly on the storage layer.

Each repository is independent, so if you create a brand new repository then the first snapshot you save there will be a complete copy.

Why do you care? If it's just for deletion purposes, then you just need to use the API to manage that.

@TimV Thanks a lot for the quick response, which makes sense to me.

But I am still interested in how the snapshot deletion API works, because I need work out a plan to delete the old data. Can you please provide more detailed info (work flow or algorithm?) on the deletion API?

Another thing is that I have to use the storage service provided by our own cloud infrastructure, so I have to implement a new elasticsearch plugin to support our own storage service. Are there any online documents/guides?

Thanks!

I don't understand. The delete API will delete the old data for you - you just need to decide when you aren't interested in keeping that snapshot any longer.
If you delete a snapshot that is still sharing data with another (presumably newer) snapshot, then the shared data will not be deleted.

I don't believe so, your best path will be to review & learn from the official repository plugins and then ask questions if you need more info.

Let me ask this question another way. I just created a snapshot named "snapshot_1", afterwards fed more data into elasticsearch cluster. Then I generated another snapshot named "snapshot_2". It's for sure that the first snapshot "snapshot_1" had a complete copy of data because it's the very first one. Regarding "snapshot_2", it's most likely an incremental one based on snapshot_1. At last, I deleted "snapshot_1", and successfully. So I am a little confused why "snapshot_1" could be deleted successfully, when it's (most likely) depended on by snapshot_2? Does it mean that the snapshot_1 was actually not deleted, even I got a successful resposne on deletion?

It seems that elasticsearch only supports the following four repository types. But I need to use our own cloud storage service. So I ask for guides on how to implement a new elasticsearch plugin. Did I miss anything? Thanks!

Shared filesystem, such as a NAS
Amazon S3
HDFS (Hadoop Distributed File System)
Azure Cloud

This blog post is old, but still explains the principles behind snapshot and restore quite well.

The snapshot was deleted. You cannot restore from that snapshot anymore. It may not have deleted all the files in use by that snapshot, but it doesn't claim to do so.

Note, snapshots are not incremental in the simplest sense of just writing deltas from the previous backup, they follow the underlying lucene segments, which means merge events on the underlying indices will be reflected in the snapshots.

And my answer was that we don't have docs for that, and your best option is to look at the code for the existing plugins, and use that to guide you.

@Christian_Dahlqvist @TimV Thanks both of you for the helps.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.