Remove orphaned data from snapshots?

cmoates · May 29, 2015, 8:56pm

Over the course of time, a number of failed snapshots have occurred for various reasons, which were out of the control of ES. I am now left with what I believe are a large number of orphaned lucene bits inside the shard directories of each of my indexes.

I've been looking over how snapshots are stored, and I believe that I understand it but wanted to see if I could get someone to confirm before I continue moving forward.

In a shard directory, I have files like __0 and __1, as well as the snapshot JSON files which have a list of files that are referenced for that particular snapshot. If I were to combine all the snapshot file lists, I should have a complete list of all files in that shard's directory to restore any of the snapshots.

The issue is that some of my shards have many files which are not referenced by these snapshot files. My assumption is that these are what are leftover from previous failed backups, and removing them will not impede my ability to restore the snapshots that are good.

I am going to do some small scale testing of my theory, but doing so at large scale is not feasible since my total data set is 60TB+.

Anyone out there who knows about this at a more advanced level than I? I've not had much luck finding information on this, but have not yet dove into the source code.

Topic		Replies	Views
Old shards not deleted upon relocation Elasticsearch	8	4303	February 22, 2017
Remove orphaned shard from ElasticSearch? Elasticsearch	1	1460	July 6, 2017
Read segments directly from the snapshot Elasticsearch snapshot-and-restore	11	404	July 7, 2023
[ES-1.4.2] Snapshot FS Repository directory structure Elasticsearch	3	1079	July 6, 2017
Snapshot backup Elasticsearch	7	634	April 2, 2020

Remove orphaned data from snapshots?

Related topics