Is snapshot incremental?

I recently setup an elasticsearch with 3 nodes and tried using and automated snapshot process. Below are the configs i used. But seeing the size i am wondering if this is really expected one. As of now i haven't started pushing data to this cluster though.
Script:-

!/bin/bash
index=backup
date=date +%Y.%m.%d
time=date +"%T"
#echo "Starting Backup"
curl -XPUT 'http://localhost:9200/_snapshot/test/'$index'-'$date'-'$time''

And the backup folder output for ls -lrth

-rw-r--r-- 1 elasticsearch elasticsearch 249 Aug 26 22:30 snapshot-backup-2015.08.26-22:30:01
-rw-r--r-- 1 elasticsearch elasticsearch 257 Aug 27 22:30 snapshot-backup-2015.08.27-22:30:01
-rw-r--r-- 1 elasticsearch elasticsearch 262 Aug 28 22:30 snapshot-backup-2015.08.28-22:30:01
-rw-r--r-- 1 elasticsearch elasticsearch 267 Aug 29 22:30 snapshot-backup-2015.08.29-22:30:01
-rw-r--r-- 1 elasticsearch elasticsearch 275 Aug 30 22:30 snapshot-backup-2015.08.30-22:30:01
-rw-r--r-- 1 elasticsearch elasticsearch 285 Aug 31 22:30 snapshot-backup-2015.08.31-22:30:01
-rw-r--r-- 1 elasticsearch elasticsearch 296 Sep 1 22:30 snapshot-backup-2015.09.01-22:30:01
-rw-r--r-- 1 elasticsearch elasticsearch 319 Sep 2 22:30 snapshot-backup-2015.09.02-22:30:01
-rw-r--r-- 1 elasticsearch elasticsearch 331 Sep 3 22:30 snapshot-backup-2015.09.03-22:30:01
-rw-r--r-- 1 elasticsearch elasticsearch 336 Sep 4 22:30 snapshot-backup-2015.09.04-22:30:01
-rw-r--r-- 1 elasticsearch elasticsearch 335 Sep 5 22:30 snapshot-backup-2015.09.05-22:30:01
-rw-r--r-- 1 elasticsearch elasticsearch 337 Sep 6 22:30 snapshot-backup-2015.09.06-22:30:01
-rw-r--r-- 1 elasticsearch elasticsearch 343 Sep 7 22:30 snapshot-backup-2015.09.07-22:30:01

This has resulted in 4 Gigs of disk space and see the increasing disk size each time. Is this really expected in incremental snapshots.

They are incremental file by file. The files in the underlying Lucene indexes are immutable and when they are snapshotted they are not re-saved if they already exist in the snapshot. This is safe because they are only cleaned up when they are not used by any snapshots.

So they aren't truly incremental which is what allows you to delete the old snapshots but they won't redo the same work twice. Mostly.

New files are created in the underlying lucene index to handle new documents, deletes, and updates. If the index doesn't change at all then the old files will still be there and the snapshot won't take up much space. If there are just new documents most of the old files should still be there with some new ones so the snapshot should seem incremental. But sometimes indexing new documents ends up causes merges which replace the old files with new ones. In those times it'll seem much less incremental.

It works this way because its the easiest way to implement something incremental-like and allow removing old snapshots. It makes restores fast as well. In many ways its genius but it makes it complex to explain the sizes of the data.

1 Like

So if i understood correctly as what you explained. My documents and snapshot can still recover data even if i delete my older snapshot , say a week ago's snapshot. I just don't want to fill up my disk space with unnecessary snapshots.

My documents and snapshot can still recover data even if i delete my older snapshot , say a week ago's snapshot

Yes, but make sure you delete snapshots via the APIs and not by directly manipulating the repository directory.

2 Likes

Quoting for truth. This cannot be overstated, always use the API! :slight_smile:

2 Likes

Thank you all for the golden words

1 Like