Indices not recovering after elasticsearch upgrade (1.0.2 -> 1.4.1)

Michel_Conrad · December 1, 2014, 10:22am

Hi,

I just updated our test environment from 1.0.2 to 1.4.1 and some
indices failed to recover, which seems to be related to the checksum
verfication introduces in 1.3.

[2014-11-28 09:40:48,019][WARN ][cluster.action.shard ] [NODE1]
[index][0] received shard failed for [index][0],
node[CWq_uCPhRKqGEAvtS1jkug], [P], s[INITIALIZING], indexUUID
[yJBShgqGQgi0q5NbMms0Sg], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[index][0] failed to fetch index
version after copying it over]; nested:
CorruptIndexException[[index][0] Preexisting corrupted index
[corrupted_JysmZSaLRXWN_BgqpRSo6Q] caused by:
CorruptIndexException[checksum failed (hardware problem?) :
expected=16ncx91 actual=1xc6e7g
resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@1afc89e8)]
org.apache.lucene.index.CorruptIndexException: checksum failed
(hardware problem?) : expected=16ncx91 actual=1xc6e7g
resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@1afc89e8)
at org.elasticsearch.index.store.LegacyVerification$Adler32VerifyingIndexOutput.verify(LegacyVerification.java:73)
at org.elasticsearch.index.store.Store.verify(Store.java:365)
at org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:599)
at org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:536)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

In order to get the indices to recover I check them using
org.apache.lucene.index.CheckIndex, the indices seemed ok, as no error
was reported. Reopening the indices did not solve the issue.

After deleting the checksums file as well as the corrupted_XXX marker
file, the indices finally recovered correctly. I suppose that the
verfication step here is simply skipped as there are no checksums to
compare against.

I am currently trying to understand the issue. Might it be that the
checksums file itself might have been corrupted. Also, while I did not
see any direct consequences of deleting the checksums file, I just
want to be sure that deleting them does not cause any issues.

Any thoughts or help is greatly appreciated,
Michel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAH0sEYgS7ts_t%3DFHkBvHk0vyt_NXDsE_v4iLergwP0g0sy6kGw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Michel_Conrad · December 3, 2014, 2:57pm

Anyone?

On Mon, Dec 1, 2014 at 11:22 AM, Michel Conrad
michel.conrad@trendiction.com wrote:

Hi,

I just updated our test environment from 1.0.2 to 1.4.1 and some
indices failed to recover, which seems to be related to the checksum
verfication introduces in 1.3.

[2014-11-28 09:40:48,019][WARN ][cluster.action.shard ] [NODE1]
[index][0] received shard failed for [index][0],
node[CWq_uCPhRKqGEAvtS1jkug], [P], s[INITIALIZING], indexUUID
[yJBShgqGQgi0q5NbMms0Sg], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[index][0] failed to fetch index
version after copying it over]; nested:
CorruptIndexException[[index][0] Preexisting corrupted index
[corrupted_JysmZSaLRXWN_BgqpRSo6Q] caused by:
CorruptIndexException[checksum failed (hardware problem?) :
expected=16ncx91 actual=1xc6e7g
resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@1afc89e8)]
org.apache.lucene.index.CorruptIndexException: checksum failed
(hardware problem?) : expected=16ncx91 actual=1xc6e7g
resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@1afc89e8)
at org.elasticsearch.index.store.LegacyVerification$Adler32VerifyingIndexOutput.verify(LegacyVerification.java:73)
at org.elasticsearch.index.store.Store.verify(Store.java:365)
at org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:599)
at org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:536)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

In order to get the indices to recover I check them using
org.apache.lucene.index.CheckIndex, the indices seemed ok, as no error
was reported. Reopening the indices did not solve the issue.

After deleting the checksums file as well as the corrupted_XXX marker
file, the indices finally recovered correctly. I suppose that the
verfication step here is simply skipped as there are no checksums to
compare against.

I am currently trying to understand the issue. Might it be that the
checksums file itself might have been corrupted. Also, while I did not
see any direct consequences of deleting the checksums file, I just
want to be sure that deleting them does not cause any issues.

Any thoughts or help is greatly appreciated,
Michel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAH0sEYiv5UuLBaxJXkzJsoFaro93Kf4%2B2_WmjjuboFKvfQcUHA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Corrupt index, checksum failed Elasticsearch	1	1057	July 6, 2017
Corrupted Shard on Recovery Elasticsearch	10	690	July 6, 2017
Elasticsearch shard corrupted Elasticsearch	6	1731	April 26, 2017
CorruptIndexException after node restart Elasticsearch	5	1033	September 26, 2017
Index corruption on cluster restart Elasticsearch	3	1315	July 6, 2017

Indices not recovering after elasticsearch upgrade (1.0.2 -> 1.4.1)

Related topics