I am running elastic 7.16.3 and snapshotting index's to azure blob storage for backup. The snapshot process is working, and after a couple of months we will go through and delete the snapshots. I have recently looked in the blob storage and see metadata files at the root from a year or so ago and a folder called indices. If I look in indices I will see a lot of other folders and when clicking into them they have a meta file and folders with files over a year old which looks like it maybe snapshots that were not deleted when the snapshot was deleted. This is a sample of a blob folder
elastic-snaps?sp=racwl&st=2021-09-08T22:03:42Z&se=2022-09-08T17:00:00Z&spr=https&sv=2020-08-04&sr=c&sig=riCZcA3Et49FFQTOLckBDl+d69ZdT0= / indices / -112gSbxSPyWjTzQ /0
Can I go through and delete those files and folders? How can I determine which ones are okay to delete? Was thinking of creating an azure Lifecycle to delete files older than the snapshots we have.
What is your use case? Are you using time-based indices? What is your retention period within the cluster?
Each Elasticsearch snapshot is a full snapshot, but it does reuse segments that have already been snapshotted. If you have indices in the cluster that are long lived and do not change much the latest snapshots may be reusing the segments from a much older snapshot.
I would therefore recommend never to delete anthing from a snapshot repository without using the Elasticsearch APIs.
We are using it for device event logging. We create a new index everyday for the days events and then create a snapshot for archival. We only need to keep the archive for a year. We only need the index available in the cluster for 90 days which at that time we delete the index.
So we only have 90 days in the cluster and if looking at snapshots we only have a year but our blob storage has continuously increased even though we deleted the older snapshots from two years ago but I see folders that far back.
WARNING: Don’t modify anything within the repository or run processes that might interfere with its contents. If something other than Elasticsearch modifies the contents of the repository then future snapshot or restore operations may fail, reporting corruption or other data inconsistencies, or may appear to succeed having silently lost some of your data.
That includes both manually deleting objects, and setting up lifecycle rules to delete them automatically.
It's possible there are some leftover objects if the deletion process is failing for some reason, but I'd expect that to be reported in the Elasticsearch logs. When a snapshot deletion succeeds it should have cleaned up anything it doesn't need any more, so anything left is still needed.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.