Snapshot & Restore - Missing/Corrupted Segments

algo · October 2, 2024, 8:26pm

Hi all,

TL;DR:

If an index segment that is referenced by snapshots becomes corrupt/missing from the snapshot repository, do you have to erase everything in the repository and start over?

Is there a way to resolve just that index's issues, without impact the other indices in the snapshots?

Background:

Our S3 repository had old segments for 2 indices become corrupted/missing (no_such_file_exception errors). All subsequent snapshots for these indices to that repo failed until we eventually emptied the repository and started over.

Does the Snapshot & Restore process have any alternatives to wiping this repo and starting again when a snapshotted segment itself becomes corrupt?

It does not seem like future snapshots will "retake" a snapshot of corrupted/missing segments and the guidance I've found here typically is to erase the repo and start over if you want to backup that index.

DavidTurner · October 2, 2024, 8:34pm

The only truly safe way to handle a broken repository is indeed to start again.

That said, I expect in many cases of shard- or index-level repository corruption the repository should start working again if you delete all the snapshots that involve the broken index. You can keep hold of the other data in the repository by cloning all the snapshots that involve the broken index, specifying *,-broken_index to remove just the broken index from the clone. Then delete all the bad snapshots.

From 8.16.0 onwards Elasticsearch will make an attempt to repair the repository contents if the missing data can be reconstructed from other blobs in the repository. It also adds an API to verify the integrity of the repository so you can proactively look for problems like these, and check that the repository contents are valid after the clone-and-delete process I suggested above.

algo · October 2, 2024, 8:37pm

Thank you!

Topic		Replies	Views
Restore snapshot checksum problem (Troubleshooting corruption) Elasticsearch snapshot-and-restore	5	31	October 31, 2024
Is it possible to only restore the incremental portion of snapshot? Elasticsearch	5	3930	September 8, 2017
Restoring indices from S3 repository Elasticsearch snapshot-and-restore	4	347	March 2, 2023
Incremental snapshots - delete index, take another snapshot and then restore the index Elasticsearch	2	1388	July 5, 2017
Restoring indexes that have missing shards from snapshot Elasticsearch	4	444	November 1, 2023

Snapshot & Restore - Missing/Corrupted Segments

Related topics