Migration from ES 2.x to ES 5.x

We have created new ES cluster v5.x. We have added new backup repository for restoring (old ES 2.x).
We have restored from our snapshot and everything was fine. We have retention period for our ES snapshots.
We are using s3 storage and repository-s3 plugin for backups.
And while our retention period has deleted this snapshot we got an error:

{

"error": {
    "root_cause": [
        {
            "type": "snapshot_missing_exception",
            "reason": "[s3_repository:snapshot_201701040203/snapshot_201701040203]  is missing"
        }
    ],
    "type": "snapshot_exception",
    "reason": "[s3_repository:snapshot_201701040203/snapshot_201701040203] Snapshot could not be read",
    "caused_by": {
        "type": "snapshot_missing_exception",
        "reason": "[s3_repository:snapshot_201701040203/snapshot_201701040203]  is missing",
        "caused_by": {
            "type": "no_such_file_exception",
            "reason": "Blob object [snap-snapshot_201701040203.dat] not found: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: DB308DF310809F58)"
        }
    }
},
"status": 500

}

Full log:
[2017-01-30T13:18:57,344][WARN ][r.suppressed ] path: /_snapshot/s3_repository/_all, params: {repository=s3_repository, snapshot=_all}
org.elasticsearch.snapshots.SnapshotException: [s3_repository:snapshot_201701040203/snapshot_201701040203] Snapshot could not be read
at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:187) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:122) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:50) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:86) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$3.doRun(TransportMasterNodeAction.java:170) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:527) [elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.1.2.jar:5.1.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
Caused by: org.elasticsearch.snapshots.SnapshotMissingException: [s3_repository:snapshot_201701040203/snapshot_201701040203] is missing
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotInfo(BlobStoreRepository.java:566) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:182) ~[elasticsearch-5.1.2.jar:5.1.2]
... 9 more
Caused by: java.nio.file.NoSuchFileException: Blob object [snap-snapshot_201701040203.dat] not found: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: 38087D5B4A20B627)
at org.elasticsearch.cloud.aws.blobstore.S3BlobContainer.readBlob(S3BlobContainer.java:92) ~[?:?]
at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:100) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:89) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotInfo(BlobStoreRepository.java:560) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:182) ~[elasticsearch-5.1.2.jar:5.1.2]

I have tried to remove this snapshot repository, remove all indeces. But if i will add this repository again, i am getting the same error.
How can i restore ES? Where ES takes information about old snapshot?

Best regards.

Does your old 2.x cluster still connect to the same repository? Or did you stop your 2.x cluster, start up a 5.x cluster, and have it point to the same repository that you were snapshotting to from your 2.x cluster? In other words, at any point did both your 2.x cluster and 5.x cluster talk to the same repository simultaneously?

Hi!
Yes, we are working on upgrading our ES from 2.x to 5.x. And we have the same repository for both.
And we found that prolem really with s3 repository, what is strange. Probably we have this problem, because we had started restoring ES 5.x cluster from this snapshot and retention period started remove of this snapshot who knows.

Best regards.

My guess is, since you had both 2.x and 5.x clusters pointing to the same repository, you probably deleted a snapshot from the 2.x cluster. 2.x and 5.x have different blob formats for snapshotting, so when you deleted the snapshot from the 2.x cluster, the files themselves were deleted, but the index file which contains a list of valid snapshots was only updated in its 2.x format. The 5.x cluster therefore has no knowledge that the snapshot in question was deleted. When it tries to load all of the snapshots, it thinks this snapshot belongs, but goes to see no files for that snapshot exist (because they were deleted from the 2.x cluster).

When connecting a 2.x and a 5.x cluster to the same repository, make sure the 2.x cluster connects to it in a read_only fashion. That will prevent you from encountering such problems. I will add a note to the docs to make it clear that a 2.x cluster should open a repository as read_only when also connected to from a 5.x cluster.

Hope this helps!

Yeah, probably you are right, but i have tried to create new ES cluster and add this repository and i got the same error on the new cluster. So some information is present somewhere in backup repository.

Yes, this error would happen when you try loading the repository's index file, which happens before trying to create a snapshot.

I would recommend the following to get you out of your predicament:

  1. disconnect your two clusters from the repository
  2. In your repository, at the top level directory, you will see some index-N files, where N is some number... for example, index-0, index-1, etc. Delete all of the index-N files, but do not delete the index file which does not have a -N after it. So keep index but delete all index-N blobs.
  3. Create the repository in your 5.x cluster. This should generate fresh index-N files for you, based on the current state of the repository.
  4. Now you can also create the repository in the 2.x cluster, just make sure you create it as readonly: see https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html#_shared_file_system_repository for documentation on the readonly flag.
2 Likes

Thank you. It works for me.)

This also worked for me. In addition I had to delete the index.last file for it to work.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.