Incremental Snapshot / Restore

Hi,

I am trying out the snapshot/restore feature in ES. I did the following
(1) Registered one of my local directory as repo
(2) Took snapshot of indexed data using command PUT /_snapshot/repo1/snapshot1
(3) Made modifications and re-indexed my original data
(4) Took another snapshot PUT /_snapshot/repo1/snapshot2
(5) (a) Deleted my index that had all the data. restored snap2 -
(b) Deleted my index that had all the data. Deleted snap 1. Restored snap2 -

If i understood it right , snap2 is by default , an incremental snapshot. Both for 5 (a) and 5 (b) i see all my data as expected. My question here is ,
Should restoring snap2 not just restore the delta it is aware of and how is it able to restore all data ? Even when i had deleted snap1 how was it able to recover all files ? Could someone explain the incremental concept better ?

Kindly help ...

Thanks in advance !!

If you snapshot a second time in snapshot1 then you will copy only the delta. It's incremental in that way.

If you create another snapshot snapshot2, then you are doing again a full backup.

That said, your test is strange to me. snapshot1 should not contain the new data when you restored it.

Thanks for the reply David !!

Basically, i was trying out this scenario, to figure out if i can implement a cleanup policy that cleans up those snapshots created before a particular date

As per your clarification, if i try to snapshot a second time in "snapshot1" ,i see this exception
{
"error": "InvalidSnapshotNameException[[repo1:snapshot1] Invalid snapshot name [snapshot1], snapshot with such name already exists]",
"status": 400
}

How do i mention that this is incremental ? Am i missing something ?

Yeah sorry. I was wrong.
You need to create a new SNAPSHOT each time you want to backup.

Each Snapshot has a list of segments. When you create a new Snapshot only new segments are copied.

The repository contains the full backup.

If you restore snapshot2, you will restore the latest version. If you restore snapshot1 you should see only the first data.

Make sense?

Yes David, it does.

So if i understood it right, the expectation is ,
-> if I restore snap2 i should see only new segments(my observation was different though, i could see both old and new segments) ?

In my initial post i had mentioned the steps i followed. And this was my Q:
Should restoring snap2 not just restore the delta it is aware of and how is it able to restore all data ? Even when i had deleted snap1 how was it able to recover all files ?

Please clarify...

No.

Let me explain with an example:

Snapshot1 pushed segments 1, 2, 3 in repository X
Snapshot2 needs to backup segments 1, 2, 3, 4 (4 contains the new data). But repository X already has segments 1, 2, 3. So the _snapshot action will only copy for snapshot2 the segment 4.

When you delete the snapshot1, as snapshot2 also has links to segments 1, 2, 3 those files are not removed.

When you restore snapshot2, you restore all files linked: segments 1, 2, 3 and 4.

Make sense?

1 Like

Makes sense , Thank you David !!!

Hi @dadoonet,

Snapshot1 took in PROD for es_item index and restored in DR (Diaster Recovery) server for backup purpose.

I took Snapshot2 of same es_item index**(Snapshot will be incremental)**.

How can we find the difference between snapshot 1 & snapshot 2(Incremental)?

Thanks,
Ganeshbabu R

I don't really know. I mean that files which have been used by snapshots are stored in a metadata file IIRC.

The question is more: "what do you exactly want to know"?

You have to understand that snapshot backups "shards" and not individual documents. So the difference you are looking after won't tell you which documents have been backup'ed on the second run.

Makes sense?

This blog post describes how it works quite well.

Thanks for your response @dadoonet

Please check the below url, in which I want to know the difference of snapshots.

In my scenario, If I know I can able to find the difference of snapshots I can workaround some activities to backup the prod data.

Pease check it and let me know your feedback.

Thanks,
Ganeshbabu R

Thanks for this reference @Christian_Dahlqvist

I clearly understood the snapshot were taken completely based on file levels

Regards,
Ganeshbabu R

This is missing from documentation. Saved my day!