Elastic snapshot to azure blob not deleting data

I am running elastic 7.16.3 and snapshotting index's to azure blob storage for backup. The snapshot process is working, and after a couple of months we will go through and delete the snapshots. I have recently looked in the blob storage and see metadata files at the root from a year or so ago and a folder called indices. If I look in indices I will see a lot of other folders and when clicking into them they have a meta file and folders with files over a year old which looks like it maybe snapshots that were not deleted when the snapshot was deleted. This is a sample of a blob folder
elastic-snaps?sp=racwl&st=2021-09-08T22:03:42Z&se=2022-09-08T17:00:00Z&spr=https&sv=2020-08-04&sr=c&sig=riCZcA3Et49FFQTOLckBDl+d69ZdT0= / indices / -112gSbxSPyWjTzQ /0

Can I go through and delete those files and folders? How can I determine which ones are okay to delete? Was thinking of creating an azure Lifecycle to delete files older than the snapshots we have.

Any ideas?

What is your use case? Are you using time-based indices? What is your retention period within the cluster?

Each Elasticsearch snapshot is a full snapshot, but it does reuse segments that have already been snapshotted. If you have indices in the cluster that are long lived and do not change much the latest snapshots may be reusing the segments from a much older snapshot.

I would therefore recommend never to delete anthing from a snapshot repository without using the Elasticsearch APIs.

We are using it for device event logging. We create a new index everyday for the days events and then create a snapshot for archival. We only need to keep the archive for a year. We only need the index available in the cluster for 90 days which at that time we delete the index.

So we only have 90 days in the cluster and if looking at snapshots we only have a year but our blob storage has continuously increased even though we deleted the older snapshots from two years ago but I see folders that far back.

Echoing Christian's answer, the docs also very much say not to do this:

WARNING: Don’t modify anything within the repository or run processes that might interfere with its contents. If something other than Elasticsearch modifies the contents of the repository then future snapshot or restore operations may fail, reporting corruption or other data inconsistencies, or may appear to succeed having silently lost some of your data.

That includes both manually deleting objects, and setting up lifecycle rules to delete them automatically.

It's possible there are some leftover objects if the deletion process is failing for some reason, but I'd expect that to be reported in the Elasticsearch logs. When a snapshot deletion succeeds it should have cleaned up anything it doesn't need any more, so anything left is still needed.

1 Like

I think I will create a new snapshot repo and then age out the old repo and then delete the old blob.

That way it will be like a per year snapshot destination. I can have multiple azure snapshot destinations?

Yes you can

Have you alread tried to clean up the repository?

I had some cases where leftover date was not deleted and clicking on the Clean Up repository button directly on Kibana interface removed them.

No I had not seen that option but will try that.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.