Hi,
I have updated from ES 0.90.1 to 0.90.7 and everything runs fine until I
restart the cluster. I have about 130 shards in 23 indices. I am running on
Debian and Java 7, three nodes in the cluster. Most of the time when I stop
and restart at least one shard will not come up and is throwing exceptions
in the logs as being corrupted, for instance the last case is this reason:
[2013-12-11 11:33:19,544][WARN ][indices.cluster ] [Base] [1millionnew][3]
failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[1millionnew][3] failed to fetch index version after copying it over
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:136)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by:
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[1millionnew][3] shard allocated for local recovery (post api), should
exist, but doesn't, current files:
..... a long list of files (my shards are quite big)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:115)
... 4 more
Caused by: java.io.FileNotFoundException: segments_azw
at
org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.java:456)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:318)
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:380)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:812)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:663)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:376)
at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:111)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:106)
... 4 more
In this particular case, the index was inactive (no search/no indexing at
that time - for quite considerable amount of time). So I can say that the
shards are failing randomly. I have checked the number of open files limit
= 65000 on all nodes for all users. So (1) I am wondering why are the
shards failing in this particular case and (2) how can I fix the problem of
missing segments_N file - the shard is striped across 4 disks and I could
not find a pattern in existence of the segments files in particular stripes
by observing other shards for other indices?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ebf37d50-e8b8-44d0-b10e-19aef3bf823e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.