Marking and sending shard failed due to [failed recovery]


#1

I'm testing ElasticSearch as a replacement for a big Mysql table ( over 1.3TB compressed ).

While importing data I got the following error:

[2015-06-05 16:41:34,780][WARN ][index.engine             ] [Venus Dee Milo] [logs][3] failed engine [already closed by tragic event]
java.lang.OutOfMemoryError: Java heap space
        at org.apache.lucene.codecs.lucene40.BitVector.clone(BitVector.java:78)
        at org.apache.lucene.codecs.lucene40.Lucene40LiveDocsFormat.newLiveDocs(Lucene40LiveDocsFormat.java:85)
        at org.apache.lucene.index.ReadersAndUpdates.initWritableLiveDocs(ReadersAndUpdates.java:268)
        at org.apache.lucene.index.BufferedUpdatesStream.applyTermDeletes(BufferedUpdatesStream.java:441)
        at org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:286)
        at org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3312)
        at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3303)
        at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:420)
        at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:297)
        at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272)
        at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:262)
        at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:171)
        at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:118)
        at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
        at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)
        at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253)
        at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:568)
        at org.elasticsearch.index.shard.IndexShard.refresh(IndexShard.java:565)
        at org.elasticsearch.index.shard.IndexShard$EngineRefresher$1.run(IndexShard.java:1089)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

I saw that I needed to increase ES_HEAP_SIZE. I did and restarted ElasticSearch. But now I keep getting the following error:

[2015-06-05 16:56:28,752][WARN ][cluster.action.shard     ] [Calvin Rankin] [logs][3] received shard failed for [logs][3], node[JbkJGd7XQV2-3ktQeGQXbA], [P], s[INITIALIZING], indexUUID [-RmOQ9o6RcOpSIKHxMp0fA], reason [shard failure [failed recovery][IndexShardGatewayRecoveryException[[logs][3] failed recovery]; nested: EngineCreationFailureException[[logs][3] failed to upgrade 3x segments]; nested: EOFException[read past EOF: NIOFSIndexInput(path="/var/lib/elasticsearch/elasticsearch/nodes/0/indices/logs/3/index/segments_6f")]; ]]
[2015-06-05 16:56:39,028][WARN ][indices.cluster          ] [Calvin Rankin] [[logs][3]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [logs][3] failed recovery
        at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:162)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: [logs][3] failed to upgrade 3x segments
        at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:121)
        at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:24)
        at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1256)
        at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1251)
        at org.elasticsearch.index.shard.IndexShard.prepareForTranslogRecovery(IndexShard.java:782)
        at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:226)
        at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112)
        ... 3 more
Caused by: java.io.EOFException: read past EOF: NIOFSIndexInput(path="/var/lib/elasticsearch/elasticsearch/nodes/0/indices/logs/3/index/segments_6f")
        at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:336)
        at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
        at org.apache.lucene.store.DataInput.readInt(DataInput.java:98)
        at org.apache.lucene.store.BufferedIndexInput.readInt(BufferedIndexInput.java:183)
        at org.elasticsearch.common.lucene.Lucene.indexNeeds3xUpgrading(Lucene.java:738)
        at org.elasticsearch.common.lucene.Lucene.upgradeLucene3xSegmentsMetadata(Lucene.java:749)
        at org.elasticsearch.index.engine.InternalEngine.upgrade3xSegments(InternalEngine.java:1072)
        at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:119)
        ... 9 more

What can I do? Is everything that was imported to ElasticSearch lost?

I'm running elasticsearch-1.5.1 in a Centos 6.6 box.


#2

I have the exact same thing...did you get this resolved?


#3

Unfortunately no. I gave up using ElasticSearch for now since it seems to be very unreliable.

I'm testing other alternatives that looks more promising than ES.


#4

Smart call. I have(had) a perfectly running 1.4.5 with no issues. After upgrading to 1.5, and doing a sudo service elasticsearch restart at noon that was it...I got the same as you. Luckily this was on a non-mission critical setup, so I downgraded back to 1.4.5, blew out the data and started over...now working like a champ. I would have to agree with you....ES 1.5 is not reliable. Hope you find a good solution.


(system) #5