We suddenly started seeing this message after launching another
instance:
---snip---
[2012-02-08 16:07:42,873][WARN ][cluster.action.shard ]
[es1.dev.example.ec2] sending failed shard for [ideas][3],
node[_lKO3A9mS5W3wrDdvuAZlg], [P], s[INITIALIZING], reason [Failed to
start shard, message [IndexShardGatewayRecoveryException[[ideas][3] No
commit point data is available in gateway]]]
[2012-02-08 16:07:43,311][WARN ][index.gateway.s3 ]
[es1.dev.studyblue.ec2] [ideas][3] listed commit_point [commit-16qe]/
[55382], but not all files exists, ignoring
[2012-02-08 16:07:43,311][WARN ][indices.cluster ]
[es1.dev.example.ec2] [ideas][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[ideas][3] No commit point data is available in gateway
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.recover(BlobStoreIndexShardGateway.java:
434)
at org.elasticsearch.index.gateway.IndexShardGatewayService
$1.run(IndexShardGatewayService.java:179)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
---snip---
Everything I've read indicates that the data in shard 3 is lost. While
that's really unfortunate, it's not the end of the world. The real
problem is that now that all the nodes in the cluster log this message
2-5 times PER SECOND, which is causing our logfiles to fill up.
Is there any way to recover from this error? Any way we just tell the
cluster to give up and "scrap" the data in shard 3?
-S