we lately encounters some strange behavoir of our s3 snapshots. We backup indices to wasabi using the s3 repository plugin with several repositories and slm policies registered. Each repository is registered once at a cluster with full access and at other clusters with readonly setting. In the cases I will describe we have full access on an elasticsearch 7.8.0 cluster and readonly access on a 7.9.1 cluster. SLM policies are set up and managed by kibana.
This setup was working very well for quite some time, until I set up a retention policy for one of the policies (lets call it p-A) to keep a max of 10 snapshots. Policy p-A is using repository r-A. Next day I discovered that from repository r-B, managed by policy p-B (something completely different) snapshot were deleted (at time of retention schedule). Trying to find out what the cause of this action was I realized, that both repositories r-A and r-B are registered using the same s3 bucket (both using default basepath). By the way, there is nothing in the docs of snapshots or s3-repository-plugin saying that this is not possible.
What puzzled me apart from the fact that snapshots from a different policy and repository were deleted (to be clear here, I can not prove it was done by the retention policy, it is a working hypothesis as time fits and no other processes than slm are triggering anything with snapshots) was the fact that from r-B the youngest snapshots were deleted (to point it out, p-A was set up later than p-B and runs more often, so maybe what for p-A was considered an old snapshot could be quite new for r-B).
Next strange thing was that after registering r-B to our new 7.9.1 cluster as readonly repository I am getting different results when listing snapshots in kibana. While the older 7.8.0 cluster shows me snapshots for p-B in r-B after the time of deletion of snapshot, the new 7.9.1 cluster with readonly repository r-B (r-A is not registered at this cluster) doesnt show these results. Still, even with those new snapshots missing, the new cluster shows more snapshots for the repository (113 to 101 at the moment), so even with some obviously missing it got more.
And, one more thing, I found a snapshot on the 7.8.0 cluster in kibana that was listed twice in kibana, once in r-A and once in r-B. Size, timestamp, content, id, everything is equal in both repositories.
Trying to understand the cause of all this I took a look at the source code for s3-repository plugin.
I aint sure if I understood it correct, it looks like S3Repository is relying upon S3BlobStore for read/write operation, but I dont see S3BlobStore actually using repository name, only bucket. Is it possible that two repositories in one bucket can not be distinguished by the s3-repository-plugin?
My questions would be
- can s3 repository plugin handle more than one repository in a bucket when no explicit basepath is set?
- if no, could this explain the deletion of snapshots from a different repository if it is registered to the same bucket?
- what could be the explanation for the different number of snapshots in a repository when registering it to different clusters?
- would you in general recommend to create a bucket per repository? or even further, one repository per policy?
Thanks for any help in advance as I am desperately trying to get back my trust in snapshot backups.