Best practices for validating elasticsearch snapshots

We used verify the snapshots on daily bases to make sure they are in good state i.e has no corrupted data. We do this by spinning up a new cluster and restoring it to validate the we have at least one good snapshot.

However, as the data grows this becomes increasing complex - we have grown to 100+ nodes and 120+ TB of data - restoring one snapshot can take up to 15 hours and we need to spin up another production size cluster with 100 nodes for just validating a snapshot. We can move the validation to weekly/bi-weekly but wondering if there are better approaches to validating than spinning up prod size cluster.

I was looking forward to this but the issue is now closed: Validate snapshot via dry-run restore. Is there anything similar that exists?

Based on what i've read elastic has a checksum "check" at each snapshot but recently we still had a corrupted index and hence would like to do this additional validation.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.