Basic question about snapshots

Elasticsearch snapshots are incremental, meaning that when I create a snapshot for an index, Elasticsearch will look at the repository and search for other snapshots and if it finds another snapshot, it will only snapshot the delta between the previous snapshot and the one I'm currently taking.

So, here is the thing I don't understand:

  1. I create an index called 'events-2017.01.04' and insert 100 events to it.
  2. I take a snapshot called 'snapshot-2017.01.04-1'
  3. I insert another 100 events to the index.
  4. I take a second snapshot called 'snapshot-2017.01.04-2'
  5. I then delete the index and the first snapshot 'snapshot-2017.01.04-1'
  6. I restore the second snapshot 'snapshot-2017.01.04-2' (note: at this point this is the only snapshot in the repository)
  7. The index is restored and has 200 events in it.

I find this confusing, if the snapshot is incremental, shouldn't I only have 100 events in that index? The second snapshot was taken when the index had 200 events, but it should have only included 100 of them since the first snapshot had 100 events in it already.

What am I missing?

Snapshots are incremental at the segment level, which means the same segment will not be backed up more than once even if it is part of multiple snapshots. This is described quite well in this blog post. A snapshot always contains all data that existed in the index when it was taken, so in your example the expected outcome is 200 events.

Thanks @Christian_Dahlqvist, I've read the blog post but still a bit unsure of how it works. Can you please elaborate? What do you mean by 'segment level'? If I compare the first snapshot to the second in terms of file size will the second snapshot be twice as big?

The size of the various segments will depend on how they have merged so is hard to predict.

A snapshot always contains the full state of the index at the point in time when the snapshot was taken. Storage-wise they might be incremental by reuse of data from older snapshots in the same repository, but whether it actually is incremental depends on the merging of the segment files.

If the older snapshots are deleted the physical segments files won't be deleted if they're referenced by newer snapshots, i.e. you'll never "break" new snapshots be deleting older ones but it's also possible that you won't reclaim any disk space with the deletions.

Thanks @magnusbaeck, so just to clarify, if I take snapshot1, and after some time I take snapshot2, the second snapshot will contain the delta and reference some segments in snapshot1. Then when I delete snapshot1, if it contains segments that are referenced by other snapshots, they will not be deleted. Is that correct?

That's correct.

That makes sense, thanks a lot for the answers.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.