Backup repository s3 bucket size is almost 5 times the actual index size

Christian_Dahlqvist · February 3, 2025, 9:16am

@DavidTurner I believe this is true at the segment level and not the document level? If you have an index that is receiving a heavy indexing load could documents not end up in multiple segments snapshotted at different times as merging continously occurs even if there are no updates to documents?

DavidTurner · February 3, 2025, 9:21am

Yes that's true Christian, any segments (and therefore shards and indices) that don't change between snapshots take up no extra space, but merging will cause some amount of write amplification that gets more noticeable at higher snapshot frequencies.

Dharani_Vattamwar · February 3, 2025, 11:46am

Okay but as snapshots also impacts the normal queries speed to some level, we would like to take snapshots only during night where load is less. So considering daily snapshots , can we consider 2X as our upper limit of s3 bucket to get an estimation of cost?
any segments (and therefore shards and indices) that don't change between snapshots take up no extra space -> this will not be effecting our case much as our indices are mostly not changed considering 24 hrs as frequency.

DavidTurner · February 3, 2025, 12:40pm

It Depends™ on the exact pattern of writes but yeah typically I'd expect daily snapshots to see a write amplification factor lower than 2x.

Note that daily snapshots are not quite sufficient to achieve a RPO of 1 day, because creating the snapshot itself takes time. I would recommend a frequency at least twice your target RPO, i.e. no longer than 12h between snapshots in your case.

RainTown · February 3, 2025, 3:09pm

With your dev system, and now you figured out the S3 versioning thing, you don't really need to make back of an envelope style estimates.

Just use the dev system (which I assume is fairly representative) to actually calculate for your own data flows.

I agree with David on making more than once per day snapshots. Lets say disaster hits you at 23:00 (and why wouldn't it?), then if you last snapshot was ca: midnight you've lost an almost an entire days worth of data straight away, even if you are able to "recover" the rest in a timely way. And I have no idea of your infrastructure, but a recovery of 150TB of data on the production system from S3 is likely going to take a while, not even counting the time it will take to (under extreme stress) bring up your new cluster.

One point, from someone who has worked in operations roles a lot of the last 30+ years:

because the amount of data that comes in is quite huge we feel 30 mins will not be sufficient to complete the backup and we may have 2 concurrent backups running

Concurrent snapshots are ok, there's no need to settle for a worse RPO just to avoid them

This isn't a technical point, but generally the mindset of Operations is that BAU has a certain rhythm, and one aspect is that backups are not running during busy times, and certainly not overlapping. So snap@1230 starting before snap@1200 finishes gets people really nervous. Even if it shouldn't, and it's technically fine to be so.

e.g. @Dharani_Vattamwar wrote:

snapshots also impacts the normal queries speed to some level, we would like to take snapshots only during night where load is less

You see !

I wonder if taking a snapshot really does noticeably impact the search speed? i.e. Has that actually been measured here? But, whatever the answer, there is that lurking implied suspicion that it might.

DavidTurner · February 3, 2025, 3:26pm

Well yeah but as the OP mentioned, their RPO is 1 day so they're ok with this.

The bigger problem is that if you schedule snapshots at 00:00 and they typically take 45 minutes then when disaster strikes at 00:30 your ongoing snapshot will fail but the previous snapshot started more than 24h ago so you fail your RPO.

Sure, if you want to impose extra constraints then that's up to you, as long as you're aware that they're your constraints rather than anything ES is forcing upon you.

Topic		Replies	Views
Snapshot Scaling Problems Elasticsearch	11	1353	July 6, 2017
What is the best practice for periodic snapshotting with awc-cloud+s3 Elasticsearch	8	2377	July 6, 2017
Elasticsearch Snapshot Elasticsearch	26	262	July 2, 2025
Snapshot Duration / Curator / Index Selection Elasticsearch	7	1439	July 6, 2017
Snapshot Duration increasing over time Elasticsearch	5	629	July 6, 2017

Backup repository s3 bucket size is almost 5 times the actual index size

Related topics