Yesterday i created a small 3 node cluster that is 5.1. I wanted to start testing a few things. Among those thing was to use the s3-repository plugin to access my S3 snapshot from my in production 2.4 cluster. Yesterday when I reached that point I only had the time to list all snapshot which worked fine.
Last night curator came through and delete snapshot from 1 year ago 2015122
This morning, I tried to list the snapshot again because I wanted to go ahead and do a restore. Listing the snapshot now fails.
{
"error" : {
"root_cause" : [
{
"type" : "snapshot_missing_exception",
"reason" : "[prod-backup:curator-20151222054502/curator-20151222054502] is missing"
}
],
"type" : "snapshot_exception",
"reason" : "[prod-backup:curator-20151222054502/curator-20151222054502] Snapshot could not be read",
"caused_by" : {
"type" : "snapshot_missing_exception",
"reason" : "[prod-backup:curator-20151222054502/curator-20151222054502] is missing",
"caused_by" : {
"type" : "no_such_file_exception",
"reason" : "Blob object [snap-curator-20151222054502.dat] not found: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: 4AC6D60C40E0CED1)"
}
}
},
"status" : 500
}
This is an issue with having both a 2.x and 5.x cluster connected to the same repository and deleting a snapshot in the repository. In 5.x, we upgraded our file formats to use a generational format (you will notice in your S3 repository blob's by the name index-N where N is an incrementing number). The side effect of this is, when you delete a snapshot in 2.x, it will only update the index file, not the latest index-N file which 5.x understands, so according to your 5.x cluster, it still believes the snapshot is there (because the snapshot still exists in the index-N file), then when it goes to look for the data for that snapshot, it doesn't find it, because the 2.x cluster's delete operation wiped out that snapshot's data from the repository.
What we advise is that when you have two clusters of different versions pointing to the same repository, only write to the repository from the higher version number. The lower versioned cluster can still access the repository but should do so in a read-only fashion. Alternatively, in your setup where 2.x is your production cluster, you can create a 5.x cluster pointing to the same repository, but then you cannot execute snapshot deletions safely.
Also, note that once you are finished experimenting with the 5.x cluster, you should delete all index-N blob (but not the index blob) in your S3 repository. That way, the next time you connect a 5.x cluster to it (presumably to upgrade your ES cluster), then it will generate new index-N blobs from the current index file of 2.x, so it will retrieve all the latest snapshot data from 2.x. Right now, any snapshot writes from the 2.x cluster will not be seen by the 5.x cluster.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.