How to reduce snapshot sizes

Jianzhou_Z · October 5, 2018, 3:33am

Backup repository size is much bigger than indices size discusses a problem that snapshots can be too large.

I do not think removing old snapshots is the solution because if the old snapshot has segments that newer snapshots do not have, we are not able to recover the data after removing old snapshots...

Does elasticsearch have a way to identify if any snapshots are safe to remove because latest snapshots 'cover' them? Or does elasticsearch have a way to clean up old backup segments that are covered by the latest segments.

The other solution is periodically generating a new snapshot from scratch... but I am not sure if this is the best solution.

Christian_Dahlqvist · October 5, 2018, 5:43am

Segments are reference counted, so will not be removed as long as there is one snapshot that uses them even if they were copied as part of a snapshot that is being deleted. You can therefore safely remove older snapshots without compromising the integrity of newer ones.

Jianzhou_Z · October 5, 2018, 5:57am

Is the reference counter visible to users? When deleting an old snapshot, how can I check if all its segments have 0 ref-count?

Christian_Dahlqvist · October 5, 2018, 5:59am

IT is handled automatically by the snapshot process and not visible as far as I know. This blog post is a bit old but describes how it works quite well.

Jianzhou_Z · October 5, 2018, 6:17am

Does it mean snapshot automatically 'recycle' useless segments?
I feel this is not quite possible if it does not know which snapshot users do not want to keep.

Also, I feel this may not be what I asked, it could be that my question is confusing. I will be giving an example to explain my question.

I have a full snapshot S0. After that I made daily snapshots S1, S2, ... Sn
I only planned to restore from the latest snapshot Sn.

When n gets larger, the total size of all Si can be getting larger and larger.

Based on https://www.elastic.co/blog/found-elasticsearch-snapshot-and-restore, each incremental snapshot Si contains only the changed segments from the last snapshot.

So it could be that an very old Sj refers to segA, segB, and a newer Si (i>j) refers to the new segA' and segB'. So if I only restore from Sn (n>=i), we only need segA' and segB' but not segA and segB.
So in my case, it is safe to remove segA and segB from the repository. This can reduce the size.

However, I do not think elastic search can do this automatically, because we do need segA and segB if anyone wants to restore from Sj.

Another solution to reduce size is creating a new full-snapshot from scratch weekly?

Christian_Dahlqvist · October 5, 2018, 6:21am

If snapshot S0 copies a segment that then does not change, Snapshot S1 will then not copy it again but instead reference it. If you then delete snapshot S0, this segment stays in the repository as it is still used by snapshot S1.

Snapshot S1 will therefore contain all segments that were present in the cluster at the time it was taken no matter when they were copied.

Jianzhou_Z · October 5, 2018, 6:33am

This means that if I always restore from the latest, I can always safely remove snapshots 7-day old. Is it correct?

This may not save too much space because the reference issue but at least it can potential save some space if possible.

Christian_Dahlqvist · October 5, 2018, 6:42am

Yes, that is correct.

This will depend on your indexing patterns. If a lot of segments are merged between snapshots you could save quite a lot of space.

system · November 2, 2018, 6:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How do I not include data from old snapshots in newer ones? Elasticsearch	7	1006	July 5, 2017
Elasticsearch Incremental Snapshot Elasticsearch	9	612	July 5, 2017
What triggers a snapshot repository cleanup? Elasticsearch	3	2127	March 24, 2017
Incremental snapshot after document update and segment Elasticsearch snapshot-and-restore	1	207	October 24, 2022
Incremental snapshots - delete index, take another snapshot and then restore the index Elasticsearch	2	1388	July 5, 2017

How to reduce snapshot sizes

Related topics