How to reduce snapshot sizes

Backup repository size is much bigger than indices size discusses a problem that snapshots can be too large.

I do not think removing old snapshots is the solution because if the old snapshot has segments that newer snapshots do not have, we are not able to recover the data after removing old snapshots...

Does elasticsearch have a way to identify if any snapshots are safe to remove because latest snapshots 'cover' them? Or does elasticsearch have a way to clean up old backup segments that are covered by the latest segments.

The other solution is periodically generating a new snapshot from scratch... but I am not sure if this is the best solution.

Segments are reference counted, so will not be removed as long as there is one snapshot that uses them even if they were copied as part of a snapshot that is being deleted. You can therefore safely remove older snapshots without compromising the integrity of newer ones.

Is the reference counter visible to users? When deleting an old snapshot, how can I check if all its segments have 0 ref-count?

IT is handled automatically by the snapshot process and not visible as far as I know. This blog post is a bit old but describes how it works quite well.

Does it mean snapshot automatically 'recycle' useless segments?
I feel this is not quite possible if it does not know which snapshot users do not want to keep.

Also, I feel this may not be what I asked, it could be that my question is confusing. I will be giving an example to explain my question.

I have a full snapshot S0. After that I made daily snapshots S1, S2, ... Sn
I only planned to restore from the latest snapshot Sn.

When n gets larger, the total size of all Si can be getting larger and larger.

Based on https://www.elastic.co/blog/found-elasticsearch-snapshot-and-restore, each incremental snapshot Si contains only the changed segments from the last snapshot.

So it could be that an very old Sj refers to segA, segB, and a newer Si (i>j) refers to the new segA' and segB'. So if I only restore from Sn (n>=i), we only need segA' and segB' but not segA and segB.
So in my case, it is safe to remove segA and segB from the repository. This can reduce the size.

However, I do not think elastic search can do this automatically, because we do need segA and segB if anyone wants to restore from Sj.

Another solution to reduce size is creating a new full-snapshot from scratch weekly?

If snapshot S0 copies a segment that then does not change, Snapshot S1 will then not copy it again but instead reference it. If you then delete snapshot S0, this segment stays in the repository as it is still used by snapshot S1.

Snapshot S1 will therefore contain all segments that were present in the cluster at the time it was taken no matter when they were copied.

1 Like

This means that if I always restore from the latest, I can always safely remove snapshots 7-day old. Is it correct?

This may not save too much space because the reference issue but at least it can potential save some space if possible.

Yes, that is correct.

This will depend on your indexing patterns. If a lot of segments are merged between snapshots you could save quite a lot of space.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.