Restores have always worked fine with Elasticsearch 2.4, but I recently updated to 6.3 and I haven't been able to get a restore to work even once for my largest index (14GB with 7 million documents). Connection is 50Mbit. This is a fresh S3 repository which has only ever had snapshots from the 6.3 cluster.
About 4 out of 12 shards are restored then the job is marked as completed while the cluster remains red.
I tripled the number of shards to 36, now about half of them restore, but the job is marked as completed and the cluster remains red.
The shard restores time out after only about 10 minutes or so. How do I increase the time allowed so they don't time out? I don't see any such option in the Curator restore config. I have this set: timeout_override: 21600, but it doesn't seem to be related, since that value is for 6 hours.
Like I said, this always worked fine with Elasticsearch 2.4. What would be different now?
Cluster (actually just a single node in this cluster) logs below:
More logs that didn't fit in 7000 character limit:
2018-07-27T00:57:22.627639500Z org.elasticsearch.indices.recovery.RecoveryFailedException: [products2][31]: Recovery failed on {elasticsearch6}{xP8M_9C_Q4yR1xTSKEL2dQ}{uKKWY-smQn-eBb3DFDSOZQ}{10.0.1.4}{10.0.1.4:9300}
2018-07-27T00:57:22.627645100Z at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2047) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627649200Z at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:626) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627652900Z at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
2018-07-27T00:57:22.627656200Z at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
2018-07-27T00:57:22.627659600Z at java.lang.Thread.run(Thread.java:844) [?:?]
2018-07-27T00:57:22.627662800Z Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed recovery
2018-07-27T00:57:22.627666100Z at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:343) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627671200Z at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:276) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627674900Z at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1574) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627677400Z at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2043) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627679900Z ... 4 more
2018-07-27T00:57:22.627682100Z Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: restore failed
2018-07-27T00:57:22.627684500Z at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:474) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627686900Z at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$5(StoreRecovery.java:278) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627689300Z at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:301) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627700000Z at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:276) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627702600Z at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1574) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627705000Z at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2043) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627707400Z ... 4 more
2018-07-27T00:57:22.627709600Z Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: failed to restore snapshot [production-20180726160403/s6npdXTAQemKpSxNiMQbLg]
2018-07-27T00:57:22.627712100Z at org.elasticsearch.repositories.blobstore.BlobStoreRepository.restoreShard(BlobStoreRepository.java:830) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627714500Z at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:460) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627716800Z at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$5(StoreRecovery.java:278) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627719200Z at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:301) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627721500Z at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:276) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627724300Z at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1574) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627726600Z at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2043) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627729000Z ... 4 more
2018-07-27T00:57:22.627731200Z Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: Failed to recover index
2018-07-27T00:57:22.627734500Z at org.elasticsearch.repositories.blobstore.BlobStoreRepository$RestoreContext.restore(BlobStoreRepository.java:1543) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627736900Z at org.elasticsearch.repositories.blobstore.BlobStoreRepository.restoreShard(BlobStoreRepository.java:828) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627739300Z at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:460) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627741700Z at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$5(StoreRecovery.java:278) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627744000Z at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:301) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627746300Z at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:276) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627748700Z at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1574) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627751000Z at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2043) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627755300Z ... 4 more
2018-07-27T00:57:22.627757600Z Caused by: java.net.SocketTimeoutException: Read timed out
2018-07-27T00:57:22.627759900Z at java.net.SocketInputStream.socketRead0(Native Method) ~[?:?]
2018-07-27T00:57:22.627762200Z at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:?]
2018-07-27T00:57:22.627764500Z at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:?]
2018-07-27T00:57:22.627766700Z at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:?]
2018-07-27T00:57:22.627769000Z at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:425) ~[?:?]
2018-07-27T00:57:22.627771300Z at sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(SSLSocketInputRecord.java:221) ~[?:?]
2018-07-27T00:57:22.627773600Z at sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:178) ~[?:?]
2018-07-27T00:57:22.627775900Z at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1031) ~[?:?]
2018-07-27T00:57:22.627778500Z at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:964) ~[?:?]
2018-07-27T00:57:22.627780800Z at sun.security.ssl.AppInputStream.read(AppInputStream.java:159) ~[?:?]
2018-07-27T00:57:22.627783100Z at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) ~[?:?]
2018-07-27T00:57:22.627785500Z at org.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:198) ~[?:?]
2018-07-27T00:57:22.627787800Z at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:176) ~[?:?]
2018-07-27T00:57:22.627790100Z at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137) ~[?:?]
2018-07-27T00:57:22.627792400Z at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) ~[?:?]
2018-07-27T00:57:22.627794700Z at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180) ~[?:?]
2018-07-27T00:57:22.627797000Z at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) ~[?:?]
2018-07-27T00:57:22.627799300Z at com.amazonaws.services.s3.internal.S3AbortableInputStream.read(S3AbortableInputStream.java:125) ~[?:?]
2018-07-27T00:57:22.627801600Z at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) ~[?:?]
2018-07-27T00:57:22.627804000Z at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) ~[?:?]
2018-07-27T00:57:22.627806300Z at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) ~[?:?]
2018-07-27T00:57:22.627808600Z at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180) ~[?:?]
2018-07-27T00:57:22.627810900Z at java.security.DigestInputStream.read(DigestInputStream.java:162) ~[?:?]
2018-07-27T00:57:22.627813100Z at com.amazonaws.services.s3.internal.DigestValidationInputStream.read(DigestValidationInputStream.java:59) ~[?:?]
2018-07-27T00:57:22.627817100Z at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82) ~[?:?]
2018-07-27T00:57:22.627819500Z at org.elasticsearch.index.snapshots.blobstore.SlicedInputStream.read(SlicedInputStream.java:92) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627821900Z at java.io.FilterInputStream.read(FilterInputStream.java:133) ~[?:?]
2018-07-27T00:57:22.627824100Z at org.elasticsearch.index.snapshots.blobstore.RateLimitingInputStream.read(RateLimitingInputStream.java:69) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627826500Z at java.io.FilterInputStream.read(FilterInputStream.java:107) ~[?:?]
2018-07-27T00:57:22.627829100Z at org.elasticsearch.repositories.blobstore.BlobStoreRepository$RestoreContext.restoreFile(BlobStoreRepository.java:1597) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627831500Z at org.elasticsearch.repositories.blobstore.BlobStoreRepository$RestoreContext.restore(BlobStoreRepository.java:1540) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627833800Z at org.elasticsearch.repositories.blobstore.BlobStoreRepository.restoreShard(BlobStoreRepository.java:828) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627836200Z at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:460) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627838600Z at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromRepository$5(StoreRecovery.java:278) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627840900Z at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:301) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627843300Z at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:276) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627845600Z at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1574) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627847900Z at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2043) ~[elasticsearch-6.3.1.jar:6.3.1]
2018-07-27T00:57:22.627850300Z ... 4 more
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.