No progress in index restore

Anatoly_Petkevich · December 1, 2015, 11:35am

We have a snapshot of 1TB index on S3 and need to restore in on another cluster.
The version of Elasticsearch is 1.7.2, version of AWS Cloud Plugin is 2.7.1, and number of primary shards is 19.
For the day no shard has been restored and tracking of the restored index via _status API shows up that size_in_bytes property doesn't have a steady grow.
Log file contains a lot of warnings:

[2015-12-01 09:16:29,779][WARN ][indices.cluster ] [i-51a3e5ef] [[ii-documents][7]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [ii-documents][7] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:162)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [ii-documents][7] restore failed
at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:135)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:109)
... 3 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [ii-documents][7] failed to restore snapshot [snapshot_221120150700]
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:164)
at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:126)
... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [ii-documents][7] Failed to recover index
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:780)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:162)
... 5 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554)
at sun.security.ssl.InputRecord.read(InputRecord.java:509)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:946)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:903)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
at org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:198)
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
at org.elasticsearch.index.snapshots.blobstore.SlicedInputStream.read(SlicedInputStream.java:92)
at java.io.InputStream.read(InputStream.java:101)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restoreFile(BlobStoreIndexShardRepository.java:813)
at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:777)
... 6 more

There is an open issue https://github.com/elastic/elasticsearch-cloud-aws/issues/149, and a related topic Snapshot restore process is not finished.
Please advise if there is some solution or workaround of this issue

dadoonet · December 1, 2015, 11:54am

Is this happening once or every time you try to restore?

It sounds like here that a Timeout happened when reading S3 buckets.

Anatoly_Petkevich · December 1, 2015, 12:13pm

It happens on a regular basis, so that no more than 10% of index data has been restored so far.

dadoonet · December 1, 2015, 3:23pm

I added a comment on the issue: repeated "Read timed out" errors when recovering a large sized shards from S3 repository · Issue #149 · elastic/elasticsearch-cloud-aws · GitHub

And may be change the default timeout which is 50s by default.
I'm unsure if this will change anything.

I wonder if the connection is good enough between your machines and S3 buckets. I assume they are in the same region?

The stacktrace shows a typical AWS connection problem. May be we should add a retry by setting ClientConfiguration (AWS SDK for Java - 1.12.607) but documentation says:

Sets the maximum number of retry attempts for failed retryable requests (ex: 5xx error responses from services).

I'm unsure if a SocketTimeoutException is a retryable request...

Topic		Replies	Views
Snapshot restore process is not finished Elasticsearch	4	2738	July 6, 2017
HDSF backup restore no progress info Elasticsearch	4	1026	July 5, 2017
Index Shard Restore Failed Except When Restoring From Snapshot Elasticsearch	1	1392	July 5, 2017
Index shard got corrupted Elasticsearch	3	3122	July 6, 2017
Restore from snapshot fails with no recovery information Elasticsearch snapshot-and-restore	3	375	August 31, 2021

No progress in index restore

Related topics