Incremental Snapshot details?

Hey Guys,

As the ElasticSearch documentation stated snapshots are incremental,

meaning the snapshot will look for differences between the last snapshot.

Is the difference always calculated from the previous snapshot, and is there a way in the api to total up the difference?

Regards Peter

Please don't mistake incremental data backups with Elasticsearch's incremental snapshots. Elasticsearch does, indeed, check for changes between a previous snapshot and a current one, but it is at the segment level, not the data level.

The most fundamental unit of an Elasticsearch index is not the shard, it is the segment. Lucene segments are immutable data objects created at flush time. As segment counts increase, Lucene prevents memory overallocation by reindexing the data from many smaller segments into 1 bigger segment (again, immutable—you can delete or create them, but not edit). All of this is completely transparent to the end user.

As a result, what this means is that your incremental snapshots may involve some data duplication. If segments A, B, and C were snapshotted in snapshot_1, and then new segments D, E, and F are created, but segments A and B are merged to become segment G, then your incremental snapshot snapshot_2 will contain segments D, E, F, and G, where the documents from segments A and B are clearly found in both snapshot_1 and snapshot_2, but only because segments A and B are now segment G, which is perceived as a new segment.

So, to restate your question, "is there a way in the api to total up the difference?" the answer is no. Everything about segment diffs is handled by Elasticsearch internally. You can restore an index to a point in time, but it will be all of the segments included in that index at that point in time. If you restore over the top of an existing index it must be in state closed. Elasticsearch can restore the segments it needs to restore over the top of that closed, inactive index, but it restores at the index level.

1 Like

Thanks for the help, this clears things up quite well

Is there a ways though to determine if a snapshot is incremented or completely new (use no previous snapshots) from the api?
Based on what you said in the previous post however, it seems like there is none because its internal, but is doesn't hurt to ask :wink:

The only way you could be 100% certain of that would be for said snapshot to be the only snapshot in the repository. So, no. The API provides no way to be certain of that.

It's important to note again that snapshot and restore are at the index level. The segments used in any index are referenced by pointers in the snapshot. That's how it can be incremental, but still restore to a point in time. The snapshotted index is a list of pointers to the segments it needs to be fully restored to that point in time. The segments themselves are copied (if an instance of that segment is not already in the repository) to the repository as needed.

Awesome thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.