Various issues with S3 Snapshots


#1

Hello,

using the repository S3 plugin to take snapshots to S3 and I'm having various issues here. After having this live for some months with an increasing number of snapshots the snapshot list calls to Elasticsearch got slower and slower.

Right now I cannot even get the list of snapshots. I have two repositories and both fail with different error messages:

curl -s "http://elasticsearch:9200/_snapshot/short_term/_all?pretty"
{
  "error" : {
    "root_cause" : [
      {
        "type" : "snapshot_missing_exception",
        "reason" : "[short_term:payments-v1-snapshot-20170722190531/IB0DbWCnT_6urghIGRjfpA]  is missing"
      }
    ],
    "type" : "snapshot_exception",
    "reason" : "[short_term:payments-v1-snapshot-20170722190531/IB0DbWCnT_6urghIGRjfpA] Snapshot could not be read",
    "caused_by" : {
      "type" : "snapshot_missing_exception",
      "reason" : "[short_term:payments-v1-snapshot-20170722190531/IB0DbWCnT_6urghIGRjfpA]  is missing",
      "caused_by" : {
        "type" : "no_such_file_exception",
        "reason" : "Blob object [snap-IB0DbWCnT_6urghIGRjfpA.dat] not found: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: FAAADD99ED71753B)"
      }
    }
  },
  "status" : 500
}

I'll attach the log in a response since I seem to have reached the post size limit

The other repository fails like this:

curl -s "http://elasticsearch:9200/_snapshot/long_term/_all?pretty"
{
  "error" : {
    "root_cause" : [
      {
        "type" : "remote_transport_exception",
        "reason" : "[SVVyQPF][10.127.1.203:9300][cluster:admin/snapshot/get]"
      }
    ],
    "type" : "null_pointer_exception",
    "reason" : null
  },
  "status" : 500
}

with this log on the server:

[2017-07-24T16:07:49,882][WARN ][r.suppressed             ] path: /_snapshot/long_term/_all, params: {pretty=, repository=long_term, snapshot=_all}
org.elasticsearch.transport.RemoteTransportException: [SVVyQPF][10.127.1.203:9300][cluster:admin/snapshot/get]
Caused by: java.lang.NullPointerException

These seem like two different issues to me but I'm not sure how to move on with this in either case.


#2

This is the log from the first call:

[2017-07-24T16:06:34,255][WARN ][r.suppressed             ] path: /_snapshot/short_term/_all, params: {pretty=, repository=short_term, snapshot=_all}
org.elasticsearch.transport.RemoteTransportException: [SVVyQPF][10.127.1.203:9300][cluster:admin/snapshot/get]
Caused by: org.elasticsearch.snapshots.SnapshotException: [short_term:payments-v1-snapshot-20170722190531/IB0DbWCnT_6urghIGRjfpA] Snapshot could not be read
	at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:197) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:136) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:55) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:87) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:166) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.5.0.jar:5.5.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.elasticsearch.snapshots.SnapshotMissingException: [short_term:payments-v1-snapshot-20170722190531/IB0DbWCnT_6urghIGRjfpA]  is missing
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotInfo(BlobStoreRepository.java:591) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:191) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:136) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:55) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:87) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:166) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.5.0.jar:5.5.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]
Caused by: java.nio.file.NoSuchFileException: Blob object [snap-IB0DbWCnT_6urghIGRjfpA.dat] not found: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: FAAADD99ED71753B)
	at org.elasticsearch.repositories.s3.S3BlobContainer.readBlob(S3BlobContainer.java:87) ~[?:?]
	at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:103) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:89) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotInfo(BlobStoreRepository.java:585) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:191) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:136) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:55) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:87) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:166) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[elasticsearch-5.5.0.jar:5.5.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.5.0.jar:5.5.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]

#3

So regarding the "short_term" repository I fixed it by manually editing the index file on it removing the entry for the missing mentioned snapshot.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.