Configured S3 repository on multiple clusters with write-access, data corrupted

I added the same S3 repository used by one of my cluster to another cluster and I forgot to give it read-only access. It states in the documentation that this action could cause data corruption :

"If you register the same snapshot repository with multiple clusters, only one cluster should have write access to the repository . Having multiple clusters write to the repository at the same time risks corrupting the contents of the repository."

I need to restore this data, is this possible or is the data lost forever?

We can't offer any guarantees but you might be lucky. All reasonably recent versions of ES try really quite hard to avoid repository corruption in this situation. The only way to find out for sure is to try it tho.

What exactly should I try doing? Adding the repository again? My cluster is version 7.17 btw.

Is updating the cluster and option?

What is the exact error message you're seeing?

I got this error once

Could not read repository data because the contents of the repository do not match its expected state. This is likely the result of either concurrently modifying the contents of the repository by a process other than this cluster or an issue with the repository's underlying storage. The repository has been disabled to prevent corrupting its contents. To re-enable it and continue using it please remove the repository from the cluster and add it again to make the cluster recover the known state of the repository from its physical contents.

then I removed and re-added the repository which lead to all snapshots being gone and they haven't come back since.

The error also disappeared.

Ill detail what I did exactly with the two clusters:

  1. Created a new cluster and connected it to the S3 that's being used as a repository by the old cluster. The same S3 is now a repository on both clusters.

  2. I proceeded to upload a snapshot from the new cluster, wanting it to appear on the old cluster. It did not appear.

  3. Then at some point I got the above error on the old cluster leading to me removing and re adding the s3 repository on both clusters.

  4. None of the clusters can discover any of the snapshots.

Are you sure you've configured the repository the same as before, with the same bucket and base path and so on? If so, unfortunately if ES cannot list the snapshots there then it won't be able to restore anything. But I would not expect having two clusters writing to the repo to do such comprehensive damage to the repository so quickly.

Thanks a lot David, I find out after your tip that my path was incorrect. I corrected it now and I can see all the snapshots.

The issue now is that when I try to restore them I get this error:

Unable to restore snapshot

[*****:application-logs-2023.01.01-90engxybs_qfy_lrnykgoa/bvczrwHNRq2KVZljI75wWw] is missing

Right, that's more like the kind of error I'd expect after a repository had multiple writers. You'll need to choose a different snapshot to restore.

Recovering said snapshot is not possible?

Likely not. Depends on the exact details of the error message (and stack trace) but this is the sort of thing that can happen when one cluster deletes this snapshot while another cluster is writing to the repository.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.