CorruptIndexException when trying to replicate one shard of a new index

Created and populated a new index on a 1.3.1 cluster. Primary shards work
fine. Updated the index to create several replicas, and three of the four
shards replicated, but one shard fails to replicate on any node with the
following error (abbreviated some of the hashes for readability):

[2014-10-22 20:31:54,549][WARN ][index.engine.internal ] [NODENAME]

[INDEXNAME][2] failed engine [corrupted preexisting index]

[2014-10-22 20:31:54,549][WARN ][indices.cluster ] [NODENAME]

[INDEXNAME][2] failed to start shard

org.apache.lucene.index.CorruptIndexException: [INDEXNAME][2] Corrupted

index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch:
actual footer=1161826848 vs expected footer=-1071082520 (resource:
MMapIndexInput(path="DATAPATH/INDEXNAME/2/index/_7cp.fdt"))]

at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343)

at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328)

at

org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:723)

at

org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:576)

at

org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:183)

at

org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)

at

org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)

at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

[2014-10-22 20:31:54,549][WARN ][cluster.action.shard ] [NODENAME]

[INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], [R],
s[INITIALIZING], indexUUID [INDEXID], reason [Failed to start shard,
message [CorruptIndexException[[INDEXNAME][2] Corrupted index [CORRUPTED]
caused by: CorruptIndexException[codec footer mismatch: actual
footer=1161826848 vs expected footer=-1071082520 (resource:
MMapIndexInput(path="DATAPATH/INDEXNAME/2/index/_7cp.fdt"))]]]]

[2014-10-22 20:31:54,550][WARN ][cluster.action.shard ] [NODENAME]

[INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], [R],
s[INITIALIZING], indexUUID [INDEXID], reason [engine failure, message
[corrupted preexisting index][CorruptIndexException[[INDEXNAME][2]
Corrupted index [CORRUPTED] caused by: CorruptIndexException[codec footer
mismatch: actual footer=1161826848 vs expected footer=-1071082520
(resource: MMapIndexInput(path="DATAPATH/INDEXNAME/2/index/_7cp.fdt"))]]]]

The index is stuck now in a state where the shards try to replicate on one
set of nodes, hit this failure, and then switch to try to replicate on a
different set of nodes. Have been looking around to see if anyone's
encountered a similar issue but haven't found anything useful yet. Anybody
know if this is recoverable or if I should just scrap it and try building a
new one?

  • Nate

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/51f1b345-a19d-4c70-873f-a88880d47e5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Can you try the workaround mentioned here:

and see if it works? If the compression issue is the problem, you can
re-enable compression, just upgrade to at least 1.3.2 which has the
fix.

On Wed, Oct 22, 2014 at 4:57 PM, Nate Folkert nfolkert@foursquare.com wrote:

Created and populated a new index on a 1.3.1 cluster. Primary shards work
fine. Updated the index to create several replicas, and three of the four
shards replicated, but one shard fails to replicate on any node with the
following error (abbreviated some of the hashes for readability):

[2014-10-22 20:31:54,549][WARN ][index.engine.internal ] [NODENAME]
[INDEXNAME][2] failed engine [corrupted preexisting index]

[2014-10-22 20:31:54,549][WARN ][indices.cluster ] [NODENAME]
[INDEXNAME][2] failed to start shard

org.apache.lucene.index.CorruptIndexException: [INDEXNAME][2] Corrupted
index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch:
actual footer=1161826848 vs expected footer=-1071082520 (resource:
MMapIndexInput(path="DATAPATH/INDEXNAME/2/index/_7cp.fdt"))]

at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343)

at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328)

at
org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:723)

at
org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:576)

at
org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:183)

at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)

at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

[2014-10-22 20:31:54,549][WARN ][cluster.action.shard ] [NODENAME]
[INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], [R],
s[INITIALIZING], indexUUID [INDEXID], reason [Failed to start shard, message
[CorruptIndexException[[INDEXNAME][2] Corrupted index [CORRUPTED] caused by:
CorruptIndexException[codec footer mismatch: actual footer=1161826848 vs
expected footer=-1071082520 (resource:
MMapIndexInput(path="DATAPATH/INDEXNAME/2/index/_7cp.fdt"))]]]]

[2014-10-22 20:31:54,550][WARN ][cluster.action.shard ] [NODENAME]
[INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID], [R],
s[INITIALIZING], indexUUID [INDEXID], reason [engine failure, message
[corrupted preexisting index][CorruptIndexException[[INDEXNAME][2] Corrupted
index [CORRUPTED] caused by: CorruptIndexException[codec footer mismatch:
actual footer=1161826848 vs expected footer=-1071082520 (resource:
MMapIndexInput(path="DATAPATH/INDEXNAME/2/index/_7cp.fdt"))]]]]

The index is stuck now in a state where the shards try to replicate on one
set of nodes, hit this failure, and then switch to try to replicate on a
different set of nodes. Have been looking around to see if anyone's
encountered a similar issue but haven't found anything useful yet. Anybody
know if this is recoverable or if I should just scrap it and try building a
new one?

  • Nate

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/51f1b345-a19d-4c70-873f-a88880d47e5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMUKNZVEaeNXW%3DH6%2Bczq2M1s7Xf5g1quabGa749M8BZYMUfe%3Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

After disabling compression, I was able to successfully replicate that
shard, so looks like we're hitting that bug. I guess we'll have to upgrade!

Thanks!

  • Nate

On Wednesday, October 22, 2014 5:26:42 PM UTC-4, Robert Muir wrote:

Can you try the workaround mentioned here:
Elasticsearch Platform — Find real-time answers at scale | Elastic

and see if it works? If the compression issue is the problem, you can
re-enable compression, just upgrade to at least 1.3.2 which has the
fix.

On Wed, Oct 22, 2014 at 4:57 PM, Nate Folkert <nfol...@foursquare.com
<javascript:>> wrote:

Created and populated a new index on a 1.3.1 cluster. Primary shards
work
fine. Updated the index to create several replicas, and three of the
four
shards replicated, but one shard fails to replicate on any node with the
following error (abbreviated some of the hashes for readability):

[2014-10-22 20:31:54,549][WARN ][index.engine.internal ] [NODENAME]
[INDEXNAME][2] failed engine [corrupted preexisting index]

[2014-10-22 20:31:54,549][WARN ][indices.cluster ] [NODENAME]
[INDEXNAME][2] failed to start shard

org.apache.lucene.index.CorruptIndexException: [INDEXNAME][2]
Corrupted
index [CORRUPTED] caused by: CorruptIndexException[codec footer
mismatch:
actual footer=1161826848 vs expected footer=-1071082520 (resource:
MMapIndexInput(path="DATAPATH/INDEXNAME/2/index/_7cp.fdt"))]

at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343)

at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328)

at

org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:723)

at

org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:576)

at

org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:183)

at

org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)

at

org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)

at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

[2014-10-22 20:31:54,549][WARN ][cluster.action.shard ] [NODENAME]
[INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID],
[R],
s[INITIALIZING], indexUUID [INDEXID], reason [Failed to start shard,
message
[CorruptIndexException[[INDEXNAME][2] Corrupted index [CORRUPTED]
caused by:
CorruptIndexException[codec footer mismatch: actual footer=1161826848
vs
expected footer=-1071082520 (resource:
MMapIndexInput(path="DATAPATH/INDEXNAME/2/index/_7cp.fdt"))]]]]

[2014-10-22 20:31:54,550][WARN ][cluster.action.shard ] [NODENAME]
[INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID],
[R],
s[INITIALIZING], indexUUID [INDEXID], reason [engine failure, message
[corrupted preexisting index][CorruptIndexException[[INDEXNAME][2]
Corrupted
index [CORRUPTED] caused by: CorruptIndexException[codec footer
mismatch:
actual footer=1161826848 vs expected footer=-1071082520 (resource:
MMapIndexInput(path="DATAPATH/INDEXNAME/2/index/_7cp.fdt"))]]]]

The index is stuck now in a state where the shards try to replicate on
one
set of nodes, hit this failure, and then switch to try to replicate on a
different set of nodes. Have been looking around to see if anyone's
encountered a similar issue but haven't found anything useful yet.
Anybody
know if this is recoverable or if I should just scrap it and try
building a
new one?

  • Nate

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/51f1b345-a19d-4c70-873f-a88880d47e5a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/210c5bf5-c71a-4d5a-891d-3485a86dc0b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thanks for closing the loop.

On Wed, Oct 22, 2014 at 6:01 PM, Nate Folkert nfolkert@foursquare.com wrote:

After disabling compression, I was able to successfully replicate that
shard, so looks like we're hitting that bug. I guess we'll have to upgrade!

Thanks!

  • Nate

On Wednesday, October 22, 2014 5:26:42 PM UTC-4, Robert Muir wrote:

Can you try the workaround mentioned here:
Elasticsearch Platform — Find real-time answers at scale | Elastic

and see if it works? If the compression issue is the problem, you can
re-enable compression, just upgrade to at least 1.3.2 which has the
fix.

On Wed, Oct 22, 2014 at 4:57 PM, Nate Folkert nfol...@foursquare.com
wrote:

Created and populated a new index on a 1.3.1 cluster. Primary shards
work
fine. Updated the index to create several replicas, and three of the
four
shards replicated, but one shard fails to replicate on any node with the
following error (abbreviated some of the hashes for readability):

[2014-10-22 20:31:54,549][WARN ][index.engine.internal ] [NODENAME]
[INDEXNAME][2] failed engine [corrupted preexisting index]

[2014-10-22 20:31:54,549][WARN ][indices.cluster ] [NODENAME]
[INDEXNAME][2] failed to start shard

org.apache.lucene.index.CorruptIndexException: [INDEXNAME][2]
Corrupted
index [CORRUPTED] caused by: CorruptIndexException[codec footer
mismatch:
actual footer=1161826848 vs expected footer=-1071082520 (resource:
MMapIndexInput(path="DATAPATH/INDEXNAME/2/index/_7cp.fdt"))]

at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343)

at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328)

at

org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:723)

at

org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:576)

at

org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:183)

at

org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)

at

org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)

at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

[2014-10-22 20:31:54,549][WARN ][cluster.action.shard ] [NODENAME]
[INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID],
[R],
s[INITIALIZING], indexUUID [INDEXID], reason [Failed to start shard,
message
[CorruptIndexException[[INDEXNAME][2] Corrupted index [CORRUPTED]
caused by:
CorruptIndexException[codec footer mismatch: actual footer=1161826848
vs
expected footer=-1071082520 (resource:
MMapIndexInput(path="DATAPATH/INDEXNAME/2/index/_7cp.fdt"))]]]]

[2014-10-22 20:31:54,550][WARN ][cluster.action.shard ] [NODENAME]
[INDEXNAME][2] sending failed shard for [INDEXNAME][2], node[NODEID],
[R],
s[INITIALIZING], indexUUID [INDEXID], reason [engine failure, message
[corrupted preexisting index][CorruptIndexException[[INDEXNAME][2]
Corrupted
index [CORRUPTED] caused by: CorruptIndexException[codec footer
mismatch:
actual footer=1161826848 vs expected footer=-1071082520 (resource:
MMapIndexInput(path="DATAPATH/INDEXNAME/2/index/_7cp.fdt"))]]]]

The index is stuck now in a state where the shards try to replicate on
one
set of nodes, hit this failure, and then switch to try to replicate on a
different set of nodes. Have been looking around to see if anyone's
encountered a similar issue but haven't found anything useful yet.
Anybody
know if this is recoverable or if I should just scrap it and try
building a
new one?

  • Nate

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/51f1b345-a19d-4c70-873f-a88880d47e5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/210c5bf5-c71a-4d5a-891d-3485a86dc0b4%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMUKNZUC9me5_t7mU%3D9ke%3DzfgcT%2Bv1Ds3dq81vFoP13CH2iV-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.