Missing Snapshot Error


(Matthew J Purcell) #1

I'm trying to use curator to delete all snapshots older then a 100 days in my repository. I noticed it was failing with the error message:

Failed to complete action: delete_snapshots. <class 'curator.exceptions.FailedExecution'>: Unable to get snapshot information from repository: dcselasticsnapshot. Error: TransportError(500, 'snapshot_exception', '[dcselasticsnapshot:citydirectory_ocrpages/XGftDvzIRPGe_wMfN6FsSw] is missing')

Looking into this further, I went and tried to list all snapshots that are currently in the repo:

GET /_cat/snapshots/dcselasticsnapshot?v&s=id

This again failed which proves to me it's not an issue with Curator but rather an issue with the snapshot API and the metadata associated with that. Intrigued, I tried to delete the snapshot, and not surprisingly I get the following:

{
  "error": {
    "root_cause": [
      {
        "type": "snapshot_missing_exception",
        "reason": "[dcselasticsnapshot:citydirectory_ocrpages/XGftDvzIRPGe_wMfN6FsSw] is missing"
      }
    ],
    "type": "snapshot_missing_exception",
    "reason": "[dcselasticsnapshot:citydirectory_ocrpages/XGftDvzIRPGe_wMfN6FsSw] is missing",
    "caused_by": {
      "type": "no_such_file_exception",
      "reason": "Blob object [snap-XGftDvzIRPGe_wMfN6FsSw.dat] not found: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: 26ED0010FAD1957E; S3 Extended Request ID: mC8EZfdWuubobpZndQGLi38t84Ni9EmDMc6kJzburYBGLhZijEtu5BUPluC4aq5hDHd5HcEN2l0=)"
    }
  },
  "status": 404
}

I tried to create a new snapshot with the missing name in order to delete it, hoping it would override the existing snapshot somehow. I get the following error, saying the snapshot already exists!

{
  "error": {
    "root_cause": [
      {
        "type": "invalid_snapshot_name_exception",
        "reason": "[dcselasticsnapshot:citydirectory_ocrpages] Invalid snapshot name [citydirectory_ocrpages], snapshot with the same name already exists"
      }
    ],
    "type": "invalid_snapshot_name_exception",
    "reason": "[dcselasticsnapshot:citydirectory_ocrpages] Invalid snapshot name [citydirectory_ocrpages], snapshot with the same name already exists"
  },
  "status": 400

}

Is there a way to delete this, so the curator script runs properly? Or even so listing all the snapshots in the repo works? I believe this is the case of a missing meta pointer. I tried looking in the repo for the actual data associated with the snapshot, but couldn't find it. If I was able to find it, would deleting the individual data/meta for the snapshot clear up this problem? FYI I'm trying to avoid deleted the entire repo and starting over, as there are many current/useful backups located there. Thanks for any/all help.


(Matthew J Purcell) #2

4 node cluster, 3 data, 1 master. s3 backup is on AWS - bucket is named 'dcselasticsnapshot'.


(Aaron Mildenstein) #3

A 500 error indicates something is wrong at the server level. Elasticsearch is perhaps having difficulty communicating with S3.

What happens if you run:

GET /_cat/snapshots/dcselasticsnapshot/

...without the extra stuff?


(Aaron Mildenstein) #4

For that matter, what do you get if you run:

GET /_cat/repositories

Elasticsearch Snapshot Error
(Matthew J Purcell) #5

Hey @theuntergeek, thanks for responding. When I run:

GET /_cat/snapshots/dcselasticsnapshot/

I receive the following response:

{
  "error": {
    "root_cause": [
      {
        "type": "snapshot_missing_exception",
        "reason": "[dcselasticsnapshot:citydirectory_ocrpages/XGftDvzIRPGe_wMfN6FsSw] is missing"
      }
    ],
    "type": "snapshot_exception",
    "reason": "[dcselasticsnapshot:citydirectory_ocrpages/XGftDvzIRPGe_wMfN6FsSw] Snapshot could not be read",
    "caused_by": {
      "type": "snapshot_missing_exception",
      "reason": "[dcselasticsnapshot:citydirectory_ocrpages/XGftDvzIRPGe_wMfN6FsSw] is missing",
      "caused_by": {
        "type": "no_such_file_exception",
        "reason": "Blob object [snap-XGftDvzIRPGe_wMfN6FsSw.dat] not found: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: F53FFCB60B165355; S3 Extended Request ID: HLrbreo3Y/a0w1mHTqtLx83AZH58Mm9li1lEjRzK+FsMAWfEqOYLM7jfIrEVuyERjPf5c1dJSZU=)"
      }
    }
  },
  "status": 500
}

When I run:

GET /_cat/repositories?v

I receive the following response:

id                 type
dcselasticsnapshot   s3

Let me know if there's anything else you need. I look forward to hearing back from you, thanks! And FYI I'm currently still putting new snapshots in this s3 repo, with success in restoring them in other clusters.


(Aaron Mildenstein) #6

You may be stuck there. The issue sounds to me like something changed or did not propagate properly in the S3 bucket that the repository metadata thinks should be there. I personally know of no way to correct that. If it were me, I'd start with a fresh repository in a different bucket.


(Matthew J Purcell) #7

Oh No- I had a feeling you were going to say that :no_mouth:. Do I need to create the repo in a different bucket in order to fix the problem, or can I just delete what's in the existing bucket manually or via commands to the cluster.


(Aaron Mildenstein) #8

If you delete the contents of the bucket, and also delete the repository from Elasticsearch, you should be able to re-use it by re-creating the repository.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.