Recovering From Corrupted Shard Following Upgrade to 1.3.1

Nariman_Haghighi · August 6, 2014, 4:44pm

A few days after the upgrade to 1.3.1 we experienced our first corrupted
shard in a 2 node cluster:

[2014-08-06 15:54:28,815][WARN ][indices.cluster ]
[FiveAces.Coffee.Web_IN_0] [streamentry5][4] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[streamentry5][4] failed to fetch index version after copying it over
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.lucene.index.CorruptIndexException: [streamentry5][4]
Corrupted index [corrupted_fuDt8NuqR_egGJK0fcjl6g] caused by:
CorruptIndexException[Invalid fieldsStream maxPointer (file truncated?):
maxPointer=6833538, length=524288]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)
... 4 more

How do we recover from this?

We've tried explicitly assigning via the reroute API:

{ "commands" : [ { "allocate" : { "index" : "streamentry5", "shard" : 4 ,
"node" : "FiveAces.Coffee.Web_IN_0", "allow_primary" : 1 }}]}

This puts the shard in INITIALIZING but quickly reverts back to UNALLOCATED
with a similar error in the logs.

I'm interested in theories on how this could have happened assuming no
significant changes on our end during this period and never having
experienced this on ES before but more importantly how to recover from it.

Thank you

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d225d1cd-79a6-455c-a4d0-6cf0dfd88314%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nariman_Haghighi · August 6, 2014, 5:23pm

I should mention that there is a primary shard 4 on the other node, just
need to understand why it's not auto recovering here what I can do to
manually remove the corrupted shard to have the primary replicated to this
node.

On Wednesday, August 6, 2014 12:44:41 PM UTC-4, Nariman Haghighi wrote:

A few days after the upgrade to 1.3.1 we experienced our first corrupted
shard in a 2 node cluster:

[2014-08-06 15:54:28,815][WARN ][indices.cluster ]
[FiveAces.Coffee.Web_IN_0] [streamentry5][4] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[streamentry5][4] failed to fetch index version after copying it over
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:152)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.lucene.index.CorruptIndexException:
[streamentry5][4] Corrupted index [corrupted_fuDt8NuqR_egGJK0fcjl6g] caused
by: CorruptIndexException[Invalid fieldsStream maxPointer (file
truncated?): maxPointer=6833538, length=524288]
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:343)
at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:328)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:119)
... 4 more

How do we recover from this?

We've tried explicitly assigning via the reroute API:

{ "commands" : [ { "allocate" : { "index" : "streamentry5", "shard" : 4 ,
"node" : "FiveAces.Coffee.Web_IN_0", "allow_primary" : 1 }}]}

This puts the shard in INITIALIZING but quickly reverts back to
UNALLOCATED with a similar error in the logs.

I'm interested in theories on how this could have happened assuming no
significant changes on our end during this period and never having
experienced this on ES before but more importantly how to recover from it.

Thank you

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/96692c31-c938-41dd-aeb4-d4e61a9a515d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
My first corrupted index Elasticsearch	1	1334	July 6, 2017
Index shard got corrupted Elasticsearch	3	3153	July 6, 2017
Corrupted Index Elasticsearch	1	494	July 6, 2017
CorruptIndexException after node restart Elasticsearch	5	1071	September 26, 2017
Elasticsearch shard corrupted Elasticsearch	6	1754	April 26, 2017

Recovering From Corrupted Shard Following Upgrade to 1.3.1

Related topics