Cannot list snapshots in S3 repository (previously working fine)


(Joe Littlejohn) #1

Hi

I've been using the repository-s3 plugin with Elasticsearch 5.3.3 happily for months. A couple of days ago, our regular snapshot restore process started failing. It shows a timeout attempting to read the list of snapshots in the repository.

I can still see the repository listed when I list repositories using this URL:

http://myhost:9200/_snapshot/_all

but when I try to list the snapshots available in the repository like this:

http://myhost:9200/_snapshot/myrepo/_all

The request just hangs indefinitely. I see no logging at all on the ES host in /var/log/elasticsearch/elasticsearch.log

Nothing related to S3 permissions or the instance running ES has changed (this is all managed by Terraform so I can be quite certain that the configuration has not changed in months).

Can anyone suggest anything I can do to diagnose the problem? Can I turn on debug logging of some kind for the repository-s3 plugin?


(Joe Littlejohn) #2

I realised that the logging for this error appears on the master. The stack trace I see in the log when this timeout occurs is:

[2017-10-17T11:38:04,620][WARN ][r.suppressed             ] path: /_snapshot/myrepo/_all, params: {repository=myrepo, snapshot=_all}
com.amazonaws.AmazonClientException: Unable to execute HTTP request: connect timed out
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:618) ~[?:?]
    at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376) ~[?:?]
    at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338) ~[?:?]
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287) ~[?:?]
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3654) ~[?:?]
    at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1092) ~[?:?]
    at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:970) ~[?:?]
    at org.elasticsearch.cloud.aws.blobstore.S3BlobContainer.readBlob(S3BlobContainer.java:84) ~[?:?]
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository.readSnapshotIndexLatestBlob(BlobStoreRepository.java:912) ~[elasticsearch-5.3.3.jar:5.3.3]
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository.latestIndexBlobId(BlobStoreRepository.java:890) ~[elasticsearch-5.3.3.jar:5.3.3]
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getRepositoryData(BlobStoreRepository.java:753) ~[elasticsearch-5.3.3.jar:5.3.3]
    at org.elasticsearch.snapshots.SnapshotsService.getRepositoryData(SnapshotsService.java:142) ~[elasticsearch-5.3.3.jar:5.3.3]
    at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:91) [elasticsearch-5.3.3.jar:5.3.3]
    at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:50) [elasticsearch-5.3.3.jar:5.3.3]
    at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:87) [elasticsearch-5.3.3.jar:5.3.3]
    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:166) [elasticsearch-5.3.3.jar:5.3.3]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.3.3.jar:5.3.3]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.3.3.jar:5.3.3]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.net.SocketTimeoutException: connect timed out
    at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.8.0_131]
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:1.8.0_131]
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[?:1.8.0_131]
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[?:1.8.0_131]
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_131]
    at java.net.Socket.connect(Socket.java:589) ~[?:1.8.0_131]
    at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:668) ~[?:?]
    at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:542) ~[?:?]
    at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:412) ~[?:?]
    at com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.connectSocket(SdkTLSSocketFactory.java:134) ~[?:?]
    at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:179) ~[?:?]
    at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:328) ~[?:?]
    at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:612) ~[?:?]
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:447) ~[?:?]
    at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:884) ~[?:?]
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[?:?]
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) ~[?:?]
    at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:837) ~[?:?]
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607) ~[?:?]
    ... 20 more

Based on the stack trace, it looks like ES is failing to get the index.latest file. When I attempt to get this file from the same host, using the AWS CLI, the file downloads immediately.


(David Pilato) #3

Can I turn on debug logging of some kind for the repository-s3 plugin?

In 5.3, packages are:

  • org.elasticsearch.cloud.aws
  • org.elasticsearch.plugin.repository.s3
  • org.elasticsearch.repositories.s3

In 5.6:

  • org.elasticsearch.repositories.s3

You can change it with:

curl -XPUT 'localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d'
{
  "transient": {
    "logger.org.elasticsearch.repositories.s3": "trace"
  }
}
'

Wondering if you could try to upgrade first to latest 5.6?

@Igor_Motov Any other idea?


(Joe Littlejohn) #4

Hmmm, I fixed this problem by restarting all nodes. So for some reason ES was failing to talk to create an SSL connection to s3.cn-north-1.amazonaws.com.cn, but after a restart this succeeded.

Maybe a stale DNS entry that was cached by the JVM? I don't think this was an EC2/networking issue as curl was able to connect to S3 perfectly well from that box.

Thanks for the advice re logging, this will come in handy next time.


(David Pilato) #5

Very good to know that could happen. Thanks for the closure on this.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.