We are getting below error on our clusters, the shard number 2 goes unassigned.
[2017-03-18 04:17:37,072][WARN ][indices.cluster ] [node-1] [index1][2] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [index1][2] failed to fetch index version after copying it over
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.index.CorruptIndexException: [index1][2] Corrupted index [corrupted_edCfNV3vTemEXmU0w_bSDQ] caused by: CorruptIndexException[checksum failed (hardware problem?) : expected=1oapn8l actual=16cchb4 (resource=name [_dvzo_Lucene49_0.dvd], length [51695646], checksum [1oapn8l], writtenBy [LUCENE_4_9])]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:434)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:419)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)
... 4 more
help on this to debug further.
we are using elasticsearch -1.3.7
Please don't say upgrade your cluster, already we are on the upgrade progress.
Old versions were not having checksum so when moving shards or merging it could happen that.
Definitely upgrading and keeping at least with the most recent versions of the major version you are using would have help to avoid that or at least to discover sooner that kind of problem.
May be you could try to start a 5.2 cluster and see if reindex from remote API could help you.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.