Backup repository s3 bucket size is almost 5 times the actual index size

Thanks for sharing the data.

first thing I notice is you are not taking snapshots of a specific index pattern, nor a data stream, you are snapshotting everything.

second thing I notice is the manual index deletion process has not been daily, e.g. several indices from late December were seemingly deleted on same day.

third thing I notice is that .ds-nfr-ontrack-2025.01.06-2025.01.06-000001 was missing from the first snapshot (feb1) but is there in the other 2. But .ds-nfr-ontrack-2025.01.05-2025.01.05-000001 and .ds-nfr-ontrack-2025.01.07-2025.01.07-000001 are in all 3 snapshots. So maybe someone deleted .ds-nfr-ontrack-2025.01.06-2025.01.06-000001 a bit early?

I am guessing there are a few errors creeping in.

The total reported size of all 3 snapshots is way less than 60TB. Well, its the sum of:

% jq '.snapshots.stats.total.size_in_bytes' snap1.json snap2.json snap3.json
4274445868309
6877399517217
6798431045213

which is

% echo $(( 4274445868309 + 6877399517217 + 6798431045213 ))
17950276430739

Please also note @Christian_Dahlqvist 's points above.

Do you have cloudTrail logging of S3 events? When your delete snapshot runs you should see a bunch of S3 files being deleted. e.g. between you snapshots on jan-30 and feb-1, the following indices are no longer referenced in the newer snapshot

.ds-nfr-ontrack-2024.12.25-2024.12.25-000001
.ds-nfr-ontrack-2024.12.26-2024.12.26-000001
.ds-nfr-ontrack-2024.12.27-2024.12.27-000001
.ds-nfr-ontrack-2024.12.28-2024.12.28-000001
.ds-nfr-ontrack-2024.12.29-2024.12.29-000001
.ds-nfr-ontrack-2024.12.30-2024.12.30-000001
.ds-nfr-ontrack-2024.12.31-2024.12.31-000001
.ds-nfr-ontrack-2025.01.06-2025.01.06-000001

But they still are in the older snapshot. But when you delete that snapshot, the one from Jan30, all those indices will not be referenced by any remaining snapshot.

1 Like