Cannot list snapshots in S3 repository (previously working fine)

(Joe Littlejohn) #1


I've been using the repository-s3 plugin with Elasticsearch 5.3.3 happily for months. A couple of days ago, our regular snapshot restore process started failing. It shows a timeout attempting to read the list of snapshots in the repository.

I can still see the repository listed when I list repositories using this URL:


but when I try to list the snapshots available in the repository like this:


The request just hangs indefinitely. I see no logging at all on the ES host in /var/log/elasticsearch/elasticsearch.log

Nothing related to S3 permissions or the instance running ES has changed (this is all managed by Terraform so I can be quite certain that the configuration has not changed in months).

Can anyone suggest anything I can do to diagnose the problem? Can I turn on debug logging of some kind for the repository-s3 plugin?

(Joe Littlejohn) #2

I realised that the logging for this error appears on the master. The stack trace I see in the log when this timeout occurs is:

[2017-10-17T11:38:04,620][WARN ][r.suppressed             ] path: /_snapshot/myrepo/_all, params: {repository=myrepo, snapshot=_all}
com.amazonaws.AmazonClientException: Unable to execute HTTP request: connect timed out
    at com.amazonaws.http.AmazonHttpClient.executeHelper( ~[?:?]
    at com.amazonaws.http.AmazonHttpClient.doExecute( ~[?:?]
    at com.amazonaws.http.AmazonHttpClient.executeWithTimer( ~[?:?]
    at com.amazonaws.http.AmazonHttpClient.execute( ~[?:?]
    at ~[?:?]
    at ~[?:?]
    at ~[?:?]
    at ~[?:?]
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository.readSnapshotIndexLatestBlob( ~[elasticsearch-5.3.3.jar:5.3.3]
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository.latestIndexBlobId( ~[elasticsearch-5.3.3.jar:5.3.3]
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getRepositoryData( ~[elasticsearch-5.3.3.jar:5.3.3]
    at org.elasticsearch.snapshots.SnapshotsService.getRepositoryData( ~[elasticsearch-5.3.3.jar:5.3.3]
    at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation( [elasticsearch-5.3.3.jar:5.3.3]
    at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation( [elasticsearch-5.3.3.jar:5.3.3]
    at [elasticsearch-5.3.3.jar:5.3.3]
    at$AsyncSingleAction$2.doRun( [elasticsearch-5.3.3.jar:5.3.3]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun( [elasticsearch-5.3.3.jar:5.3.3]
    at [elasticsearch-5.3.3.jar:5.3.3]
    at java.util.concurrent.ThreadPoolExecutor.runWorker( [?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$ [?:1.8.0_131]
    at [?:1.8.0_131]
Caused by: connect timed out
    at Method) ~[?:1.8.0_131]
    at ~[?:1.8.0_131]
    at ~[?:1.8.0_131]
    at ~[?:1.8.0_131]
    at ~[?:1.8.0_131]
    at ~[?:1.8.0_131]
    at ~[?:?]
    at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket( ~[?:?]
    at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket( ~[?:?]
    at com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.connectSocket( ~[?:?]
    at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection( ~[?:?]
    at ~[?:?]
    at org.apache.http.impl.client.DefaultRequestDirector.tryConnect( ~[?:?]
    at org.apache.http.impl.client.DefaultRequestDirector.execute( ~[?:?]
    at org.apache.http.impl.client.AbstractHttpClient.doExecute( ~[?:?]
    at org.apache.http.impl.client.CloseableHttpClient.execute( ~[?:?]
    at org.apache.http.impl.client.CloseableHttpClient.execute( ~[?:?]
    at com.amazonaws.http.AmazonHttpClient.executeOneRequest( ~[?:?]
    at com.amazonaws.http.AmazonHttpClient.executeHelper( ~[?:?]
    ... 20 more

Based on the stack trace, it looks like ES is failing to get the index.latest file. When I attempt to get this file from the same host, using the AWS CLI, the file downloads immediately.

(David Pilato) #3

Can I turn on debug logging of some kind for the repository-s3 plugin?

In 5.3, packages are:

  • org.elasticsearch.plugin.repository.s3
  • org.elasticsearch.repositories.s3

In 5.6:

  • org.elasticsearch.repositories.s3

You can change it with:

curl -XPUT 'localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d'
  "transient": {
    "": "trace"

Wondering if you could try to upgrade first to latest 5.6?

@Igor_Motov Any other idea?

(Joe Littlejohn) #4

Hmmm, I fixed this problem by restarting all nodes. So for some reason ES was failing to talk to create an SSL connection to, but after a restart this succeeded.

Maybe a stale DNS entry that was cached by the JVM? I don't think this was an EC2/networking issue as curl was able to connect to S3 perfectly well from that box.

Thanks for the advice re logging, this will come in handy next time.

(David Pilato) #5

Very good to know that could happen. Thanks for the closure on this.

(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.