Restore from S3 times out since updating from ES 2.4 to 6.3

Restores have always worked fine with Elasticsearch 2.4, but I recently updated to 6.3 and I haven't been able to get a restore to work even once for my largest index (14GB with 7 million documents). Connection is 50Mbit. This is a fresh S3 repository which has only ever had snapshots from the 6.3 cluster.

About 4 out of 12 shards are restored then the job is marked as completed while the cluster remains red.

I tripled the number of shards to 36, now about half of them restore, but the job is marked as completed and the cluster remains red.

The shard restores time out after only about 10 minutes or so. How do I increase the time allowed so they don't time out? I don't see any such option in the Curator restore config. I have this set: timeout_override: 21600, but it doesn't seem to be related, since that value is for 6 hours.

Like I said, this always worked fine with Elasticsearch 2.4. What would be different now?

Cluster (actually just a single node in this cluster) logs below:

2018-07-27T00:57:22.627589600Z [2018-07-27T00:57:22,624][WARN ][o.e.c.r.a.AllocationService] [elasticsearch6] failing shard [failed shard, shard [products2][31], node[xP8M_9C_Q4yR1xTSKEL2dQ], [P], recovery_source[snapshot recovery from s3-backup:production-20180726160403/s6npdXTAQemKpSxNiMQbLg], s[INITIALIZING], a[id=RkRSXt6cRMqxTTWZd_ISKQ], unassigned_info[[reason=NEW_INDEX_RESTORED], at[2018-07-26T22:44:14.295Z], delayed=false, details[restore_source[s3-backup/production-20180726160403]], allocation_status[deciders_throttled]], message [failed recovery], failure [RecoveryFailedException[[products2][31]: Recovery failed on {elasticsearch6}{xP8M_9C_Q4yR1xTSKEL2dQ}{uKKWY-smQn-eBb3DFDSOZQ}{10.0.1.4}{10.0.1.4:9300}]; nested: IndexShardRecoveryException[failed recovery]; nested: IndexShardRestoreFailedException[restore failed]; nested: IndexShardRestoreFailedException[failed to restore snapshot [production-20180726160403/s6npdXTAQemKpSxNiMQbLg]]; nested: IndexShardRestoreFailedException[Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ], markAsStale [true]]

More logs that didn't fit in 7000 character limit:

2018-07-27T00:57:22.627639500Z org.elasticsearch.indices.recovery.RecoveryFailedException: [products2][31]: Recovery failed on {elasticsearch6}{xP8M_9C_Q4yR1xTSKEL2dQ}{uKKWY-smQn-eBb3DFDSOZQ}{10.0.1.4}{10.0.1.4:9300}
2018-07-27T00:57:22.627645100Z 	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2047) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627649200Z 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:626) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627652900Z 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
2018-07-27T00:57:22.627656200Z 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
2018-07-27T00:57:22.627659600Z 	at java.lang.Thread.run(Thread.java:844) [?:?]
2018-07-27T00:57:22.627662800Z Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed recovery
2018-07-27T00:57:22.627666100Z 	at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:343) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627671200Z 	at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:276) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627674900Z 	at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1574) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627677400Z 	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2043) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627679900Z 	... 4 more
2018-07-27T00:57:22.627682100Z Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: restore failed
2018-07-27T00:57:22.627684500Z 	at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:474) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627686900Z 	at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$5(StoreRecovery.java:278) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627689300Z 	at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:301) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627700000Z 	at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:276) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627702600Z 	at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1574) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627705000Z 	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2043) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627707400Z 	... 4 more
2018-07-27T00:57:22.627709600Z Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: failed to restore snapshot [production-20180726160403/s6npdXTAQemKpSxNiMQbLg]
2018-07-27T00:57:22.627712100Z 	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.restoreShard(BlobStoreRepository.java:830) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627714500Z 	at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:460) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627716800Z 	at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$5(StoreRecovery.java:278) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627719200Z 	at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:301) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627721500Z 	at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:276) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627724300Z 	at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1574) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627726600Z 	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2043) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627729000Z 	... 4 more
2018-07-27T00:57:22.627731200Z Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: Failed to recover index
2018-07-27T00:57:22.627734500Z 	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$RestoreContext.restore(BlobStoreRepository.java:1543) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627736900Z 	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.restoreShard(BlobStoreRepository.java:828) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627739300Z 	at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:460) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627741700Z 	at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$5(StoreRecovery.java:278) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627744000Z 	at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:301) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627746300Z 	at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:276) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627748700Z 	at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1574) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627751000Z 	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2043) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627755300Z 	... 4 more

And the rest of the relevant logs:

2018-07-27T00:57:22.627757600Z Caused by: java.net.SocketTimeoutException: Read timed out
2018-07-27T00:57:22.627759900Z 	at java.net.SocketInputStream.socketRead0(Native Method) ~[?:?]
2018-07-27T00:57:22.627762200Z 	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:?]
2018-07-27T00:57:22.627764500Z 	at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:?]
2018-07-27T00:57:22.627766700Z 	at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:?]
2018-07-27T00:57:22.627769000Z 	at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:425) ~[?:?]
2018-07-27T00:57:22.627771300Z 	at sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(SSLSocketInputRecord.java:221) ~[?:?]
2018-07-27T00:57:22.627773600Z 	at sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:178) ~[?:?]
2018-07-27T00:57:22.627775900Z 	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1031) ~[?:?]
2018-07-27T00:57:22.627778500Z 	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:964) ~[?:?]
2018-07-27T00:57:22.627780800Z 	at sun.security.ssl.AppInputStream.read(AppInputStream.java:159) ~[?:?]
2018-07-27T00:57:22.627783100Z 	at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) ~[?:?]
2018-07-27T00:57:22.627785500Z 	at org.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:198) ~[?:?]
2018-07-27T00:57:22.627787800Z 	at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:176) ~[?:?]
2018-07-27T00:57:22.627790100Z 	at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137) ~[?:?]
2018-07-27T00:57:22.627792400Z 	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) ~[?:?]
2018-07-27T00:57:22.627794700Z 	at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180) ~[?:?]
2018-07-27T00:57:22.627797000Z 	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) ~[?:?]
2018-07-27T00:57:22.627799300Z 	at com.amazonaws.services.s3.internal.S3AbortableInputStream.read(S3AbortableInputStream.java:125) ~[?:?]
2018-07-27T00:57:22.627801600Z 	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) ~[?:?]
2018-07-27T00:57:22.627804000Z 	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) ~[?:?]
2018-07-27T00:57:22.627806300Z 	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) ~[?:?]
2018-07-27T00:57:22.627808600Z 	at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180) ~[?:?]
2018-07-27T00:57:22.627810900Z 	at java.security.DigestInputStream.read(DigestInputStream.java:162) ~[?:?]
2018-07-27T00:57:22.627813100Z 	at com.amazonaws.services.s3.internal.DigestValidationInputStream.read(DigestValidationInputStream.java:59) ~[?:?]
2018-07-27T00:57:22.627817100Z 	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) ~[?:?]
2018-07-27T00:57:22.627819500Z 	at org.elasticsearch.index.snapshots.blobstore.SlicedInputStream.read(SlicedInputStream.java:92) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627821900Z 	at java.io.FilterInputStream.read(FilterInputStream.java:133) ~[?:?]
2018-07-27T00:57:22.627824100Z 	at org.elasticsearch.index.snapshots.blobstore.RateLimitingInputStream.read(RateLimitingInputStream.java:69) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627826500Z 	at java.io.FilterInputStream.read(FilterInputStream.java:107) ~[?:?]
2018-07-27T00:57:22.627829100Z 	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$RestoreContext.restoreFile(BlobStoreRepository.java:1597) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627831500Z 	at org.elasticsearch.repositories.blobstore.BlobStoreRepository$RestoreContext.restore(BlobStoreRepository.java:1540) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627833800Z 	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.restoreShard(BlobStoreRepository.java:828) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627836200Z 	at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:460) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627838600Z 	at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$5(StoreRecovery.java:278) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627840900Z 	at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:301) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627843300Z 	at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:276) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627845600Z 	at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1574) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627847900Z 	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2043) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627850300Z 	... 4 more

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.