Hi
in all our elasticsearch cluster we use this elasticsearch-cloud-aws plugin
to create the snapshots on s3 on a regular basis.
Some times we saw the shard got corrupted for an index in our elasticsearch
log.
So we try to restore it from backup and while restoring it from backup
again we see the same exception in logs which is follows
[2015-02-25 08:18:10,824][WARN ][indices.cluster ]
[test-es-cluster-1e-data-2] [lst_p113_v_4_20140615_0000][0] failed to start
shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[lst_p113_v_4_20140615_0000][0] failed recovery
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:185)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by:
org.elasticsearch.index.snapshots.IndexShardRestoreFailedException:
[lst_p113_v_4_20140615_0000][0] restore failed
at
org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:130)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:127)
... 3 more
Caused by:
org.elasticsearch.index.snapshots.IndexShardRestoreFailedException:
[lst_p113_v_4_20140615_0000][0] failed to restore snapshot
[listening-prod6-20150224]
at
org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:165)
at
org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:124)
... 4 more
Caused by:
org.elasticsearch.index.snapshots.IndexShardRestoreFailedException:
[lst_p113_v_4_20140615_0000][0] Failed to recover index
at
org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:787)
at
org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:162)
... 5 more
Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed
(hardware problem?) : expected=1lvsjli actual=3awj8p
resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@7266a49d)
at
org.elasticsearch.index.store.LegacyVerification$Adler32VerifyingIndexOutput.verify(LegacyVerification.java:73)
at org.elasticsearch.index.store.Store.verify(Store.java:365)
at
org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restoreFile(BlobStoreIndexShardRepository.java:843)
at
org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:784)
... 6 more
[2015-02-25 08:18:10,826][WARN ][cluster.action.shard ]
[test-es-cluster-1e-data-2] [lst_p113_v_4_20140615_0000][0] sending failed
shard for [lst_p113_v_4_20140615_0000][0], node[shNgLjr8RlW7Zrk3P4UdPg],
[P], restoring[aws-prod-elasticsearch-backup:listening-prod6-20150224],
s[INITIALIZING], indexUUID [ZQKQ-6naQqeLP1Gk8IFsig], reason [Failed to
start shard, message
[IndexShardGatewayRecoveryException[[lst_p113_v_4_20140615_0000][0] failed
recovery]; nested:
IndexShardRestoreFailedException[[lst_p113_v_4_20140615_0000][0] restore
failed]; nested:
IndexShardRestoreFailedException[[lst_p113_v_4_20140615_0000][0] failed to
restore snapshot [listening-prod6-20150224]]; nested:
IndexShardRestoreFailedException[[lst_p113_v_4_20140615_0000][0] Failed to
recover index]; nested: CorruptIndexException[checksum failed (hardware
problem?) : expected=1lvsjli actual=3awj8p
resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@7266a49d)]; ]]
Even if we back to an older snapshot we found the same exception.
So what we did was we download all the segments files from s3 merge it and
there we found some segments were corrupted by using
org.apache.lucene.index.CheckIndex with -fix
We fixed it but we loose 5gb data.
We shared this problem with elasticsearch-cloud-aws team , They didnot give
any reply till now,
Can you guys please have a look into this issue and suggest something
Thanks
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9a7b1069-22b2-401b-a40c-096eb12db937%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.