I've been using the repository-s3 plugin with Elasticsearch 5.3.3 happily for months. A couple of days ago, our regular snapshot restore process started failing. It shows a timeout attempting to read the list of snapshots in the repository.
I can still see the repository listed when I list repositories using this URL:
The request just hangs indefinitely. I see no logging at all on the ES host in /var/log/elasticsearch/elasticsearch.log
Nothing related to S3 permissions or the instance running ES has changed (this is all managed by Terraform so I can be quite certain that the configuration has not changed in months).
Can anyone suggest anything I can do to diagnose the problem? Can I turn on debug logging of some kind for the repository-s3 plugin?
I realised that the logging for this error appears on the master. The stack trace I see in the log when this timeout occurs is:
[2017-10-17T11:38:04,620][WARN ][r.suppressed ] path: /_snapshot/myrepo/_all, params: {repository=myrepo, snapshot=_all}
com.amazonaws.AmazonClientException: Unable to execute HTTP request: connect timed out
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:618) ~[?:?]
at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376) ~[?:?]
at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338) ~[?:?]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287) ~[?:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3654) ~[?:?]
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1092) ~[?:?]
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:970) ~[?:?]
at org.elasticsearch.cloud.aws.blobstore.S3BlobContainer.readBlob(S3BlobContainer.java:84) ~[?:?]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.readSnapshotIndexLatestBlob(BlobStoreRepository.java:912) ~[elasticsearch-5.3.3.jar:5.3.3]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.latestIndexBlobId(BlobStoreRepository.java:890) ~[elasticsearch-5.3.3.jar:5.3.3]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getRepositoryData(BlobStoreRepository.java:753) ~[elasticsearch-5.3.3.jar:5.3.3]
at org.elasticsearch.snapshots.SnapshotsService.getRepositoryData(SnapshotsService.java:142) ~[elasticsearch-5.3.3.jar:5.3.3]
at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:91) [elasticsearch-5.3.3.jar:5.3.3]
at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:50) [elasticsearch-5.3.3.jar:5.3.3]
at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:87) [elasticsearch-5.3.3.jar:5.3.3]
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:166) [elasticsearch-5.3.3.jar:5.3.3]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.3.3.jar:5.3.3]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.3.3.jar:5.3.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.8.0_131]
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:1.8.0_131]
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[?:1.8.0_131]
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[?:1.8.0_131]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_131]
at java.net.Socket.connect(Socket.java:589) ~[?:1.8.0_131]
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:668) ~[?:?]
at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:542) ~[?:?]
at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:412) ~[?:?]
at com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.connectSocket(SdkTLSSocketFactory.java:134) ~[?:?]
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:179) ~[?:?]
at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:328) ~[?:?]
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:612) ~[?:?]
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:447) ~[?:?]
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:884) ~[?:?]
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[?:?]
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) ~[?:?]
at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:837) ~[?:?]
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607) ~[?:?]
... 20 more
Based on the stack trace, it looks like ES is failing to get the index.latest file. When I attempt to get this file from the same host, using the AWS CLI, the file downloads immediately.
Hmmm, I fixed this problem by restarting all nodes. So for some reason ES was failing to talk to create an SSL connection to s3.cn-north-1.amazonaws.com.cn, but after a restart this succeeded.
Maybe a stale DNS entry that was cached by the JVM? I don't think this was an EC2/networking issue as curl was able to connect to S3 perfectly well from that box.
Thanks for the advice re logging, this will come in handy next time.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.