Shard corruption

Jongyoon_Lee · April 25, 2014, 12:40am

We have several ES clusters running 1.0.0. It appears that there is a
shard corruption on a couple of clusters. It reports NPE for one of the
shard and it appears documents are not getting indexed on the shard from
then on.

[2014-04-24T15:10:12.181+0000][DEBUG][action.bulk ] [prod-
search-42] [main.5.2][81] failed to execute bulk item (index) index {[main.
5.2][Attachment][......-1452300182662303400-WNFL11.xlsx-1], source[{"guid":
"1452300182662303400","account_id":"......","estimated_size":12662,
"filename":"WNFL11.xlsx","content_type":"APPLICATION/VND.OPENXMLFORMATS-OFFICEDOCUMENT.SPREADSHEETML.SHEET;
\r\n\tname=WNFL11.xlsx"}]}

java.lang.NullPointerException
at org.elasticsearch.common.lucene.uid.Versions.loadDocIdAndVersion(
Versions.java:93)
at org.elasticsearch.common.lucene.uid.Versions.loadDocIdAndVersion(
Versions.java:65)
at org.elasticsearch.common.lucene.uid.Versions.loadVersion(Versions
.java:82)
at org.elasticsearch.index.engine.internal.InternalEngine.
loadCurrentVersionFromIndex(InternalEngine.java:1321)
at org.elasticsearch.index.engine.internal.InternalEngine.innerIndex
(InternalEngine.java:504)
at org.elasticsearch.index.engine.internal.InternalEngine.index(
InternalEngine.java:479)
at org.elasticsearch.index.shard.service.InternalIndexShard.index(
InternalIndexShard.java:404)
at org.elasticsearch.action.bulk.TransportShardBulkAction.
shardIndexOperation(TransportShardBulkAction.java:395)
at org.elasticsearch.action.bulk.TransportShardBulkAction.
shardOperationOnPrimary(TransportShardBulkAction.java:153)
at org.elasticsearch.action.support.replication.
TransportShardReplicationOperationAction$AsyncShardOperationAction.
performOnPrimary(TransportShardReplicationOperationAction.java:556)
at org.elasticsearch.action.support.replication.
TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(
TransportShardReplicationOperationAction.java:426)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

At this point it appears shard 81 is corrupt, and it'll report bulk index
error for all the new index request on this shard. I have the same concern
as the previous poster who reported shard failure that this is a critical
error but it's logged as DEBUG. What can we do at this point, short of
rebuilding the indices? Is there anything we can do to prevent this
problem from happening in the future?

Jongyoon Lee

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b0104671-b4f4-40e0-a1bb-04f67f676efc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.