Corrupted Shard on Recovery

Greetings,

I tried to overcome slowly recovering replica set, changed the number of
replicas on index to 0, then to 1, getting this exception:


[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador Dali]
[...-2014.08.29][1] File corruption on recovery name [_40d_es090_0.pos],
length [11345418], checksum [ekoi4m], writtenBy [LUCENE_4_9] local checksum
OK
org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Any pointers?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Have you checked your hardware status as the error mentioned? I'd also do a
FS check to be safe.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 3 September 2014 14:58, David Kleiner david.kleiner@gmail.com wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number of
replicas on index to 0, then to 1, getting this exception:


[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador
Dali] [...-2014.08.29][1] File corruption on recovery name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9] local checksum OK
org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Any pointers?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624baOvPUssvCVFWGqDhOQz44WBLULCq%2B-aUpJ05aWCcZ1A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Mark, it's a SATA RAID5 volume, ext4 fs, with following mount
options:

/dev/sdb1 on /acc type ext4
(rw,noatime,data=writeback,barrier=0,nobh,errors=remount-ro)

and journal enabled.

Perhaps I'm being too aggressive with squeezing performance out this fs?

On Tuesday, September 2, 2014 10:03:21 PM UTC-7, Mark Walkom wrote:

Have you checked your hardware status as the error mentioned? I'd also do
a FS check to be safe.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 3 September 2014 14:58, David Kleiner <david....@gmail.com
<javascript:>> wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number of
replicas on index to 0, then to 1, getting this exception:


[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador
Dali] [...-2014.08.29][1] File corruption on recovery name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9] local checksum OK
org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Any pointers?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/870fdfa5-bd98-425a-91f1-fc8ce18c16d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Another data point, replica shard that's trying to initialized keeps
growing past the size of its master counterpart.

master: 1.3G 1
copy #1: 24G 1
copy #2: 23G 1

total index size is 6.28G, something is not right here...

On Wednesday, September 3, 2014 1:10:30 PM UTC-7, David Kleiner wrote:

Thanks Mark, it's a SATA RAID5 volume, ext4 fs, with following mount
options:

/dev/sdb1 on /acc type ext4
(rw,noatime,data=writeback,barrier=0,nobh,errors=remount-ro)

and journal enabled.

Perhaps I'm being too aggressive with squeezing performance out this fs?

On Tuesday, September 2, 2014 10:03:21 PM UTC-7, Mark Walkom wrote:

Have you checked your hardware status as the error mentioned? I'd also do
a FS check to be safe.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 3 September 2014 14:58, David Kleiner david....@gmail.com wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number of
replicas on index to 0, then to 1, getting this exception:


[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador
Dali] [...-2014.08.29][1] File corruption on recovery name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9] local checksum OK
org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Any pointers?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d5f6465a-10e5-42c5-8f70-2662d08de545%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi David,

did you manage to fix your issue? I'm observing exactly the same symptoms
as you with one of my indices. Hardware seems to be OK.

Thanks
Christoph

Am Mittwoch, 3. September 2014 23:13:26 UTC+2 schrieb David Kleiner:

Another data point, replica shard that's trying to initialized keeps
growing past the size of its master counterpart.

master: 1.3G 1
copy #1: 24G 1
copy #2: 23G 1

total index size is 6.28G, something is not right here...

On Wednesday, September 3, 2014 1:10:30 PM UTC-7, David Kleiner wrote:

Thanks Mark, it's a SATA RAID5 volume, ext4 fs, with following mount
options:

/dev/sdb1 on /acc type ext4
(rw,noatime,data=writeback,barrier=0,nobh,errors=remount-ro)

and journal enabled.

Perhaps I'm being too aggressive with squeezing performance out this fs?

On Tuesday, September 2, 2014 10:03:21 PM UTC-7, Mark Walkom wrote:

Have you checked your hardware status as the error mentioned? I'd also
do a FS check to be safe.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 3 September 2014 14:58, David Kleiner david....@gmail.com wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number
of replicas on index to 0, then to 1, getting this exception:


[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador
Dali] [...-2014.08.29][1] File corruption on recovery name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9] local checksum OK
org.apache.lucene.index.CorruptIndexException: checksum failed
(hardware problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Any pointers?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/454b19e4-861f-4746-869b-21668fb7c1ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

The error says "local checksum OK"... what version of elasticsearch
are you running?

If its before 1.3.2, please read this:

On Wed, Sep 3, 2014 at 12:58 AM, David Kleiner david.kleiner@gmail.com wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number of
replicas on index to 0, then to 1, getting this exception:


[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador Dali]
[...-2014.08.29][1] File corruption on recovery name [_40d_es090_0.pos],
length [11345418], checksum [ekoi4m], writtenBy [LUCENE_4_9] local checksum
OK
org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Any pointers?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMUKNZXTNae5Njf2-EfFSYOmb1fEQtOMXDmpBN87_aNfz34wdg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I'm running 1.3.1. Thanks a lot for the hint. I will try to upgrade and let
you know.

What is the recommended way of upgrading? One minor version at a time or
can I do a rolling upgrade to 1.3.5?

Thanks!
Christoph

Am Dienstag, 11. November 2014 19:38:55 UTC+1 schrieb Robert Muir:

The error says "local checksum OK"... what version of elasticsearch
are you running?

If its before 1.3.2, please read this:
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wed, Sep 3, 2014 at 12:58 AM, David Kleiner <david....@gmail.com
<javascript:>> wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number of
replicas on index to 0, then to 1, getting this exception:


[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador
Dali]
[...-2014.08.29][1] File corruption on recovery name [_40d_es090_0.pos],
length [11345418], checksum [ekoi4m], writtenBy [LUCENE_4_9] local
checksum
OK
org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at

org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)

    at 

org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)

    at 

org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)

    at 

org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)

    at 

org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)

    at 

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

    at 

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

    at java.lang.Thread.run(Thread.java:745) 

Any pointers?

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b9dfd69c-94ce-4e15-ac84-04b0034a5f2d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

First, i would try the workaround mentioned in the article: disable
the compression and see if fixes the issue.

On Tue, Nov 11, 2014 at 1:42 PM, Christoph Tavan
christoph.tavan@gmail.com wrote:

I'm running 1.3.1. Thanks a lot for the hint. I will try to upgrade and let
you know.

What is the recommended way of upgrading? One minor version at a time or can
I do a rolling upgrade to 1.3.5?

Thanks!
Christoph

Am Dienstag, 11. November 2014 19:38:55 UTC+1 schrieb Robert Muir:

The error says "local checksum OK"... what version of elasticsearch
are you running?

If its before 1.3.2, please read this:
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wed, Sep 3, 2014 at 12:58 AM, David Kleiner david....@gmail.com
wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number of
replicas on index to 0, then to 1, getting this exception:


[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador
Dali]
[...-2014.08.29][1] File corruption on recovery name [_40d_es090_0.pos],
length [11345418], checksum [ekoi4m], writtenBy [LUCENE_4_9] local
checksum
OK
org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at

org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
at

org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
at

org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
at

org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
at

org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Any pointers?

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b9dfd69c-94ce-4e15-ac84-04b0034a5f2d%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOdYfZUSmuC_pfKP4%3DgRz4URYD82tMRo-ax%3D6L__jm-804E2qQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hello Christoph,

Yes, I did - I removed a huge index we no longer needed and reduced # of
replicas to 1 then 2 and that seems to have fixed it. I also upgraded to
1.3.4 so that could be a factor as well.

Cheers,

David

On Tuesday, November 11, 2014 5:21:50 AM UTC-8, Christoph Tavan wrote:

Hi David,

did you manage to fix your issue? I'm observing exactly the same symptoms
as you with one of my indices. Hardware seems to be OK.

Thanks
Christoph

Am Mittwoch, 3. September 2014 23:13:26 UTC+2 schrieb David Kleiner:

Another data point, replica shard that's trying to initialized keeps
growing past the size of its master counterpart.

master: 1.3G 1
copy #1: 24G 1
copy #2: 23G 1

total index size is 6.28G, something is not right here...

On Wednesday, September 3, 2014 1:10:30 PM UTC-7, David Kleiner wrote:

Thanks Mark, it's a SATA RAID5 volume, ext4 fs, with following mount
options:

/dev/sdb1 on /acc type ext4
(rw,noatime,data=writeback,barrier=0,nobh,errors=remount-ro)

and journal enabled.

Perhaps I'm being too aggressive with squeezing performance out this fs?

On Tuesday, September 2, 2014 10:03:21 PM UTC-7, Mark Walkom wrote:

Have you checked your hardware status as the error mentioned? I'd also
do a FS check to be safe.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 3 September 2014 14:58, David Kleiner david....@gmail.com wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number
of replicas on index to 0, then to 1, getting this exception:


[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador
Dali] [...-2014.08.29][1] File corruption on recovery name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9] local checksum OK
org.apache.lucene.index.CorruptIndexException: checksum failed
(hardware problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)
at
org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)
at
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Any pointers?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8398ac0f-c4d2-441a-84ba-584c13ca3d7b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

The workaround described in the release notes fixed my issue! Thanks a lot!

Am Dienstag, 11. November 2014 20:11:57 UTC+1 schrieb rcmuir:

First, i would try the workaround mentioned in the article: disable
the compression and see if fixes the issue.

On Tue, Nov 11, 2014 at 1:42 PM, Christoph Tavan
<christo...@gmail.com <javascript:>> wrote:

I'm running 1.3.1. Thanks a lot for the hint. I will try to upgrade and
let
you know.

What is the recommended way of upgrading? One minor version at a time or
can
I do a rolling upgrade to 1.3.5?

Thanks!
Christoph

Am Dienstag, 11. November 2014 19:38:55 UTC+1 schrieb Robert Muir:

The error says "local checksum OK"... what version of elasticsearch
are you running?

If its before 1.3.2, please read this:
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wed, Sep 3, 2014 at 12:58 AM, David Kleiner david....@gmail.com
wrote:

Greetings,

I tried to overcome slowly recovering replica set, changed the number
of
replicas on index to 0, then to 1, getting this exception:


[2014-09-02 23:51:59,738][WARN ][indices.recovery ] [Salvador
Dali]
[...-2014.08.29][1] File corruption on recovery name
[_40d_es090_0.pos],
length [11345418], checksum [ekoi4m], writtenBy [LUCENE_4_9] local
checksum
OK
org.apache.lucene.index.CorruptIndexException: checksum failed
(hardware
problem?) : expected=ekoi4m actual=1pdwf09 (resource=name
[_40d_es090_0.pos], length [11345418], checksum [ekoi4m], writtenBy
[LUCENE_4_9])
at

org.elasticsearch.index.store.Store$VerifyingIndexOutput.readAndCompareChecksum(Store.java:684)

    at 

org.elasticsearch.index.store.Store$VerifyingIndexOutput.writeBytes(Store.java:696)

    at 

org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:589)

    at 

org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:533)

    at 

org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)

    at 

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

    at 

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

    at java.lang.Thread.run(Thread.java:745) 

Any pointers?

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send
an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/cba135a4-7838-4ad5-b56c-439823f7653b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/b9dfd69c-94ce-4e15-ac84-04b0034a5f2d%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/39309f48-9258-49c2-887a-ef3a552bc3a5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.