Corrupted Shard on Recovery

David_Kleiner · September 3, 2014, 4:58am

Greetings,

I tried to overcome slowly recovering replica set, changed the number of
replicas on index to 0, then to 1, getting this exception:

[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador Dali]
[...-2014.08.29][1] File corruption on recovery name [_40d_es090_0.pos],
length [11345418], checksum [ekoi4m], writtenBy [LUCENE_4_9] local checksum
OK
org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Any pointers?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · September 3, 2014, 5:02am

Have you checked your hardware status as the error mentioned? I'd also do a
FS check to be safe.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 3 September 2014 14:58, David Kleiner david.kleiner@gmail.com wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number of
replicas on index to 0, then to 1, getting this exception:

[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador
Dali] [...-2014.08.29][1] File corruption on recovery name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9] local checksum OK
org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Any pointers?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624baOvPUssvCVFWGqDhOQz44WBLULCq%2B-aUpJ05aWCcZ1A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

David_Kleiner · September 3, 2014, 8:10pm

Thanks Mark, it's a SATA RAID5 volume, ext4 fs, with following mount
options:

/dev/sdb1 on /acc type ext4
(rw,noatime,data=writeback,barrier=0,nobh,errors=remount-ro)

and journal enabled.

Perhaps I'm being too aggressive with squeezing performance out this fs?

On Tuesday, September 2, 2014 10:03:21 PM UTC-7, Mark Walkom wrote:

Have you checked your hardware status as the error mentioned? I'd also do
a FS check to be safe.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 3 September 2014 14:58, David Kleiner <david....@gmail.com
<javascript:>> wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number of
replicas on index to 0, then to 1, getting this exception:

[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador
Dali] [...-2014.08.29][1] File corruption on recovery name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9] local checksum OK
org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Any pointers?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/870fdfa5-bd98-425a-91f1-fc8ce18c16d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David_Kleiner · September 3, 2014, 9:13pm

Another data point, replica shard that's trying to initialized keeps
growing past the size of its master counterpart.

master: 1.3G 1
copy #1: 24G 1
copy #2: 23G 1

total index size is 6.28G, something is not right here...

On Wednesday, September 3, 2014 1:10:30 PM UTC-7, David Kleiner wrote:

Thanks Mark, it's a SATA RAID5 volume, ext4 fs, with following mount
options:

/dev/sdb1 on /acc type ext4
(rw,noatime,data=writeback,barrier=0,nobh,errors=remount-ro)

and journal enabled.

Perhaps I'm being too aggressive with squeezing performance out this fs?

On Tuesday, September 2, 2014 10:03:21 PM UTC-7, Mark Walkom wrote:

Have you checked your hardware status as the error mentioned? I'd also do
a FS check to be safe.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 3 September 2014 14:58, David Kleiner david....@gmail.com wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number of
replicas on index to 0, then to 1, getting this exception:

[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador
Dali] [...-2014.08.29][1] File corruption on recovery name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9] local checksum OK
org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Any pointers?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d5f6465a-10e5-42c5-8f70-2662d08de545%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Christoph_Tavan · November 11, 2014, 1:21pm

Hi David,

did you manage to fix your issue? I'm observing exactly the same symptoms
as you with one of my indices. Hardware seems to be OK.

Thanks
Christoph

Am Mittwoch, 3. September 2014 23:13:26 UTC+2 schrieb David Kleiner:

Another data point, replica shard that's trying to initialized keeps
growing past the size of its master counterpart.

master: 1.3G 1
copy #1: 24G 1
copy #2: 23G 1

total index size is 6.28G, something is not right here...

On Wednesday, September 3, 2014 1:10:30 PM UTC-7, David Kleiner wrote:

Thanks Mark, it's a SATA RAID5 volume, ext4 fs, with following mount
options:

/dev/sdb1 on /acc type ext4
(rw,noatime,data=writeback,barrier=0,nobh,errors=remount-ro)

and journal enabled.

Perhaps I'm being too aggressive with squeezing performance out this fs?

On Tuesday, September 2, 2014 10:03:21 PM UTC-7, Mark Walkom wrote:

Have you checked your hardware status as the error mentioned? I'd also
do a FS check to be safe.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 3 September 2014 14:58, David Kleiner david....@gmail.com wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number
of replicas on index to 0, then to 1, getting this exception:

[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador
Dali] [...-2014.08.29][1] File corruption on recovery name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9] local checksum OK
org.apache.lucene.index.CorruptIndexException: checksum failed
(hardware problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Any pointers?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/454b19e4-861f-4746-869b-21668fb7c1ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robert_Muir_2 · November 11, 2014, 6:38pm

The error says "local checksum OK"... what version of elasticsearch
are you running?

If its before 1.3.2, please read this:

On Wed, Sep 3, 2014 at 12:58 AM, David Kleiner david.kleiner@gmail.com wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number of
replicas on index to 0, then to 1, getting this exception:

[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador Dali]
[...-2014.08.29][1] File corruption on recovery name [_40d_es090_0.pos],
length [11345418], checksum [ekoi4m], writtenBy [LUCENE_4_9] local checksum
OK
org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Any pointers?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMUKNZXTNae5Njf2-EfFSYOmb1fEQtOMXDmpBN87_aNfz34wdg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Christoph_Tavan · November 11, 2014, 6:42pm

I'm running 1.3.1. Thanks a lot for the hint. I will try to upgrade and let
you know.

What is the recommended way of upgrading? One minor version at a time or
can I do a rolling upgrade to 1.3.5?

Thanks!
Christoph

Am Dienstag, 11. November 2014 19:38:55 UTC+1 schrieb Robert Muir:

The error says "local checksum OK"... what version of elasticsearch
are you running?

If its before 1.3.2, please read this:
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wed, Sep 3, 2014 at 12:58 AM, David Kleiner <david....@gmail.com
<javascript:>> wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number of
replicas on index to 0, then to 1, getting this exception:

[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador
Dali]
[...-2014.08.29][1] File corruption on recovery name [_40d_es090_0.pos],
length [11345418], checksum [ekoi4m], writtenBy [LUCENE_4_9] local
checksum
OK
org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at

org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
    at 
org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
    at 
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
    at 
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
    at 
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745) 
Any pointers?

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b9dfd69c-94ce-4e15-ac84-04b0034a5f2d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

rmuir · November 11, 2014, 7:11pm

First, i would try the workaround mentioned in the article: disable
the compression and see if fixes the issue.

On Tue, Nov 11, 2014 at 1:42 PM, Christoph Tavan
christoph.tavan@gmail.com wrote:

I'm running 1.3.1. Thanks a lot for the hint. I will try to upgrade and let
you know.

What is the recommended way of upgrading? One minor version at a time or can
I do a rolling upgrade to 1.3.5?

Thanks!
Christoph

Am Dienstag, 11. November 2014 19:38:55 UTC+1 schrieb Robert Muir:

The error says "local checksum OK"... what version of elasticsearch
are you running?

If its before 1.3.2, please read this:
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wed, Sep 3, 2014 at 12:58 AM, David Kleiner david....@gmail.com
wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number of
replicas on index to 0, then to 1, getting this exception:

[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador
Dali]
[...-2014.08.29][1] File corruption on recovery name [_40d_es090_0.pos],
length [11345418], checksum [ekoi4m], writtenBy [LUCENE_4_9] local
checksum
OK
org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at

org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
at

org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
at

org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
at

org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
at

org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Any pointers?

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b9dfd69c-94ce-4e15-ac84-04b0034a5f2d%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOdYfZUSmuC_pfKP4%3DgRz4URYD82tMRo-ax%3D6L__jm-804E2qQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

David_Kleiner · November 11, 2014, 8:33pm

Hello Christoph,

Yes, I did - I removed a huge index we no longer needed and reduced # of
replicas to 1 then 2 and that seems to have fixed it. I also upgraded to
1.3.4 so that could be a factor as well.

Cheers,

David

On Tuesday, November 11, 2014 5:21:50 AM UTC-8, Christoph Tavan wrote:

Hi David,

did you manage to fix your issue? I'm observing exactly the same symptoms
as you with one of my indices. Hardware seems to be OK.

Thanks
Christoph

Am Mittwoch, 3. September 2014 23:13:26 UTC+2 schrieb David Kleiner:

Another data point, replica shard that's trying to initialized keeps
growing past the size of its master counterpart.

master: 1.3G 1
copy #1: 24G 1
copy #2: 23G 1

total index size is 6.28G, something is not right here...

On Wednesday, September 3, 2014 1:10:30 PM UTC-7, David Kleiner wrote:

Thanks Mark, it's a SATA RAID5 volume, ext4 fs, with following mount
options:

/dev/sdb1 on /acc type ext4
(rw,noatime,data=writeback,barrier=0,nobh,errors=remount-ro)

and journal enabled.

Perhaps I'm being too aggressive with squeezing performance out this fs?

On Tuesday, September 2, 2014 10:03:21 PM UTC-7, Mark Walkom wrote:

Have you checked your hardware status as the error mentioned? I'd also
do a FS check to be safe.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 3 September 2014 14:58, David Kleiner david....@gmail.com wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number
of replicas on index to 0, then to 1, getting this exception:

[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador
Dali] [...-2014.08.29][1] File corruption on recovery name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9] local checksum OK
org.apache.lucene.index.CorruptIndexException: checksum failed
(hardware problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Any pointers?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8398ac0f-c4d2-441a-84ba-584c13ca3d7b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Christoph_Tavan · November 12, 2014, 7:06am

The workaround described in the release notes fixed my issue! Thanks a lot!

Am Dienstag, 11. November 2014 20:11:57 UTC+1 schrieb rcmuir:

First, i would try the workaround mentioned in the article: disable
the compression and see if fixes the issue.

On Tue, Nov 11, 2014 at 1:42 PM, Christoph Tavan
<christo...@gmail.com <javascript:>> wrote:

I'm running 1.3.1. Thanks a lot for the hint. I will try to upgrade and
let
you know.

What is the recommended way of upgrading? One minor version at a time or
can
I do a rolling upgrade to 1.3.5?

Thanks!
Christoph

Am Dienstag, 11. November 2014 19:38:55 UTC+1 schrieb Robert Muir:

The error says "local checksum OK"... what version of elasticsearch
are you running?

If its before 1.3.2, please read this:
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wed, Sep 3, 2014 at 12:58 AM, David Kleiner david....@gmail.com
wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number
of
replicas on index to 0, then to 1, getting this exception:

[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador
Dali]
[...-2014.08.29][1] File corruption on recovery name
[_40d_es090_0.pos],
length [11345418], checksum [ekoi4m], writtenBy [LUCENE_4_9] local
checksum
OK
org.apache.lucene.index.CorruptIndexException: checksum failed
(hardware
problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at

org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
    at 
org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
    at 
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
    at 
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
    at 
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745) 
Any pointers?

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send
an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/b9dfd69c-94ce-4e15-ac84-04b0034a5f2d%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/39309f48-9258-49c2-887a-ef3a552bc3a5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
CorruptIndexException after node restart Elasticsearch	5	1039	September 26, 2017
Indices not recovering after elasticsearch upgrade (1.0.2 -> 1.4.1) Elasticsearch	2	453	July 6, 2017
Checksum failed (hardware problem?) Elasticsearch	3	726	February 8, 2023
["org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) Elasticsearch docker , language-clients	1	224	March 18, 2024
Elasticsearch shard corrupted Elasticsearch	6	1732	April 26, 2017

Corrupted Shard on Recovery

Related topics