Elasticsearch Snapshot

If you are using daily indices I assume you are only writing to the current one. If this is the case older indices will effectively become read-only (you may want to enforce this through ILM policy) and the segments no longer change after the first couple of days. Indices are then deleted, which does not affect the segments.

Why not? I can test this on a single node on a laptop by using smaller indices and reduced data volumes.

You can remove the older snapshots if you don't need the old backups anymore.

You’re asking questions in slightly roundabout ways, drip feeding new information and random restrictions

2 pieces of advice:

Any testing you do, to be truly informative, should be on system whose patterns are as similar as possible to your production use case. Same as any sort of integration testing.

You seem concerned with what you call bloat. Consider carefully the scenarios your snapshots are intended to protect/mittigate. Your production dataset is 600GB, by most standards that’s pretty small. I don’t know how many restorable snapshots you are aiming to keep, or for how long, but the implication is you are trying to minimise this? Bad things can’t be guaranteed to happen at convenient times, nor are all bad things picked up right away.

1 Like

I created a backup of an Elasticsearch cluster that was 716 MB in size. After taking the snapshot, the folder size on the operating system was as follows. Subsequently, I added an index that was 142.8 MB, and after taking another snapshot, the size of the "indices" folder increased to 864 MB, which was expected. However, after dropping the index and taking another snapshot, I noticed that the size did not decrease. Shouldn't the related segments be removed once we drop the index?

[root@prestovm1 clusterbkp]# du -sh indices/
718M    indices/

Please help me to understand .

Did you delete the first snapshot that holds the segments that were removed?

No. When adding or removing indices, it is essential to take a fresh snapshot of the cluster and delete any old snapshots that are no longer needed. This approach ensures efficient space management at the OS level.

Sorry to disturb you for these basic things.

Thanks,
Debasis

Do you mean you only ever want one and exactly one snapshot, taken X hours/days/whatever ago, on your disk repository? On an ongoing basis?

Anything that was in an elasticsearch index BEFORE that single last snapshot was taken, but was subsequently deleted, is of no interest to you?

(I would not personally call that an effective backup strategy)

There seems a misunderstanding here.

If you made a snapshot when indexA was part of the cluster, and have not actively removed that snapshot, then you should still have a restorable snapshot that could be used to restore indexA to its state at the time that snapshot was made.

If you subsequently delete indexA and create and populate indexB, and then make another snapshot , you should then have 2 restorable snapshots, 2 restore points, albeit in the same repo.

On the slightly wider topic …

In my experience most people would work with a number of restore points in their repos. This is similar to old-school tape backups, where retention policies would eventually allow tapes to be re-used. But in these things, and it’s uglier great uncle Disaster Recovery, there are usually other complications. There’s nothing wrong with trying to keep things simple, but please keep in mind the reasons you are making snapshots. Maybe a slight shift of focus away from trying to minimise bytes used. IMHO