Merge Error with lucene 4.1

While testing the master build with lucene 4.1, I got a permanent merge
error filling up my logs.
Has anyone else seen it? Anything I can provide to help debug?

[2013-02-24 22:43:38,863][WARN ][index.merge.scheduler ] [Enchantress]
[events][0] failed to merge
java.lang.ArrayIndexOutOfBoundsException
at org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:132)
at
org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$ChunkIterator.decompress(CompressingStoredFieldsReader.java:388)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:368)
at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:283)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3698)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3303)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:90)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)

Thanks!

-Taras

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

hey,

can you give some more background here, like did you start this on an old
index ie. an index you created with a previous version of ES / Lucene? Can
you reproduce this?

simon

On Monday, February 25, 2013 7:54:46 AM UTC+1, Taras Shkvarchuk wrote:

While testing the master build with lucene 4.1, I got a permanent merge
error filling up my logs.
Has anyone else seen it? Anything I can provide to help debug?

[2013-02-24 22:43:38,863][WARN ][index.merge.scheduler ] [Enchantress]
[events][0] failed to merge
java.lang.ArrayIndexOutOfBoundsException
at org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:132)
at
org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$ChunkIterator.decompress(CompressingStoredFieldsReader.java:388)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:368)
at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:283)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3698)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3303)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:90)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)

Thanks!

-Taras

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey Taras,

it would be awesome if you could provide some more infos about you problems
so we can fix it quickly

simon

On Monday, February 25, 2013 9:42:20 AM UTC+1, simonw wrote:

hey,

can you give some more background here, like did you start this on an old
index ie. an index you created with a previous version of ES / Lucene? Can
you reproduce this?

simon

On Monday, February 25, 2013 7:54:46 AM UTC+1, Taras Shkvarchuk wrote:

While testing the master build with lucene 4.1, I got a permanent merge
error filling up my logs.
Has anyone else seen it? Anything I can provide to help debug?

[2013-02-24 22:43:38,863][WARN ][index.merge.scheduler ] [Enchantress]
[events][0] failed to merge
java.lang.ArrayIndexOutOfBoundsException
at org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:132)
at
org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$ChunkIterator.decompress(CompressingStoredFieldsReader.java:388)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:368)
at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:283)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3698)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3303)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:90)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)

Thanks!

-Taras

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Not sure what type of info you're looking for, so you can e-mail me for
more details, I can probably even provide the index in question for
debugging.
There are 420K root level documents as all of which have small nested
documents. Re-indexing nearly the same data resulted in an index without
corruption.

Here is the _status and _segments call.
{
"ok" : true,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"indices" : {
"events" : {
"index" : {
"primary_size" : "673.3mb",
"primary_size_in_bytes" : 706043580,
"size" : "673.3mb",
"size_in_bytes" : 706043580
},
"translog" : {
"operations" : 0
},
"docs" : {
"num_docs" : 1869131,
"max_doc" : 2263281,
"deleted_docs" : 394150
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 1,
"total_time" : "3.7s",
"total_time_in_millis" : 3776,
"total_docs" : 2263281,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 1,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 1,
"total_time" : "18ms",
"total_time_in_millis" : 18
},
"shards" : {
"0" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "vwbSeFjtQ3SqdbJzh8V-uA",
"relocating_node" : null,
"shard" : 0,
"index" : "events"
},
"state" : "STARTED",
"index" : {
"size" : "673.3mb",
"size_in_bytes" : 706043580
},
"translog" : {
"id" : 1361432592857,
"operations" : 0
},
"docs" : {
"num_docs" : 1869131,
"max_doc" : 2263281,
"deleted_docs" : 394150
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 1,
"total_time" : "3.7s",
"total_time_in_millis" : 3776,
"total_docs" : 2263281,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 1,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 1,
"total_time" : "18ms",
"total_time_in_millis" : 18
}
} ]
}
}
}

_segments

{
"ok" : true,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"indices" : {
"events" : {
"shards" : {
"0" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "vwbSeFjtQ3SqdbJzh8V-uA"
},
"num_committed_segments" : 20,
"num_search_segments" : 20,
"segments" : {
"_4qm" : {
"generation" : 6142,
"num_docs" : 6730,
"deleted_docs" : 67313,
"size" : "29.8mb",
"size_in_bytes" : 31342776,
"committed" : true,
"search" : true
},
"_ukk" : {
"generation" : 39620,
"num_docs" : 579895,
"deleted_docs" : 232003,
"size" : "203.9mb",
"size_in_bytes" : 213859775,
"committed" : true,
"search" : true
},
"_1o32" : {
"generation" : 77870,
"num_docs" : 1131181,
"deleted_docs" : 71964,
"size" : "389.9mb",
"size_in_bytes" : 408880404,
"committed" : true,
"search" : true
},
"_1o3e" : {
"generation" : 77882,
"num_docs" : 17179,
"deleted_docs" : 15495,
"size" : "9.9mb",
"size_in_bytes" : 10401244,
"committed" : true,
"search" : true
},
"_1qse" : {
"generation" : 81374,
"num_docs" : 64627,
"deleted_docs" : 1870,
"size" : "16.5mb",
"size_in_bytes" : 17305817,
"committed" : true,
"search" : true
},
"_1qyf" : {
"generation" : 81591,
"num_docs" : 9475,
"deleted_docs" : 2073,
"size" : "3.7mb",
"size_in_bytes" : 3894229,
"committed" : true,
"search" : true
},
"_1r7a" : {
"generation" : 81910,
"num_docs" : 15972,
"deleted_docs" : 2375,
"size" : "5mb",
"size_in_bytes" : 5267040,
"committed" : true,
"search" : true
},
"_1rck" : {
"generation" : 82100,
"num_docs" : 9818,
"deleted_docs" : 6,
"size" : "3.5mb",
"size_in_bytes" : 3753253,
"committed" : true,
"search" : true
},
"_1re0" : {
"generation" : 82152,
"num_docs" : 16651,
"deleted_docs" : 787,
"size" : "4.6mb",
"size_in_bytes" : 4896408,
"committed" : true,
"search" : true
},
"_1rhz" : {
"generation" : 82295,
"num_docs" : 14006,
"deleted_docs" : 264,
"size" : "4.7mb",
"size_in_bytes" : 5016838,
"committed" : true,
"search" : true
},
"_1ri0" : {
"generation" : 82296,
"num_docs" : 479,
"deleted_docs" : 0,
"size" : "162.7kb",
"size_in_bytes" : 166651,
"committed" : true,
"search" : true
},
"_1ri4" : {
"generation" : 82300,
"num_docs" : 266,
"deleted_docs" : 0,
"size" : "110.1kb",
"size_in_bytes" : 112763,
"committed" : true,
"search" : true
},
"_1ri6" : {
"generation" : 82302,
"num_docs" : 250,
"deleted_docs" : 0,
"size" : "109.3kb",
"size_in_bytes" : 111965,
"committed" : true,
"search" : true
},
"_1ri8" : {
"generation" : 82304,
"num_docs" : 153,
"deleted_docs" : 0,
"size" : "85.4kb",
"size_in_bytes" : 87460,
"committed" : true,
"search" : true
},
"_1ria" : {
"generation" : 82306,
"num_docs" : 549,
"deleted_docs" : 0,
"size" : "190.7kb",
"size_in_bytes" : 195340,
"committed" : true,
"search" : true
},
"_1riu" : {
"generation" : 82326,
"num_docs" : 261,
"deleted_docs" : 0,
"size" : "106kb",
"size_in_bytes" : 108604,
"committed" : true,
"search" : true
},
"_1rj6" : {
"generation" : 82338,
"num_docs" : 533,
"deleted_docs" : 0,
"size" : "191.1kb",
"size_in_bytes" : 195748,
"committed" : true,
"search" : true
},
"_1rj8" : {
"generation" : 82340,
"num_docs" : 276,
"deleted_docs" : 0,
"size" : "119.4kb",
"size_in_bytes" : 122299,
"committed" : true,
"search" : true
},
"_1rjb" : {
"generation" : 82343,
"num_docs" : 251,
"deleted_docs" : 0,
"size" : "100.7kb",
"size_in_bytes" : 103129,
"committed" : true,
"search" : true
},
"_1rjs" : {
"generation" : 82360,
"num_docs" : 579,
"deleted_docs" : 0,
"size" : "207.2kb",
"size_in_bytes" : 212180,
"committed" : true,
"search" : true
}
}
} ]
}
}
}
}

On Monday, February 25, 2013 6:22:31 AM UTC-8, simonw wrote:

Hey Taras,

it would be awesome if you could provide some more infos about you
problems so we can fix it quickly

simon

On Monday, February 25, 2013 9:42:20 AM UTC+1, simonw wrote:

hey,

can you give some more background here, like did you start this on an old
index ie. an index you created with a previous version of ES / Lucene? Can
you reproduce this?

simon

On Monday, February 25, 2013 7:54:46 AM UTC+1, Taras Shkvarchuk wrote:

While testing the master build with lucene 4.1, I got a permanent merge
error filling up my logs.
Has anyone else seen it? Anything I can provide to help debug?

[2013-02-24 22:43:38,863][WARN ][index.merge.scheduler ]
[Enchantress] [events][0] failed to merge
java.lang.ArrayIndexOutOfBoundsException
at org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:132)
at
org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$ChunkIterator.decompress(CompressingStoredFieldsReader.java:388)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:368)
at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:283)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3698)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3303)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:90)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)

Thanks!

-Taras

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

do you have the index that is corrupted still? if so I really would love to
look at it if possible? can you provide it somehow. Contact me at
simon.willnauer@elasticsearch.com

thanks!!

simon

On Monday, February 25, 2013 7:10:18 PM UTC+1, Taras Shkvarchuk wrote:

Not sure what type of info you're looking for, so you can e-mail me for
more details, I can probably even provide the index in question for
debugging.
There are 420K root level documents as all of which have small nested
documents. Re-indexing nearly the same data resulted in an index without
corruption.

Here is the _status and _segments call.
{
"ok" : true,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"indices" : {
"events" : {
"index" : {
"primary_size" : "673.3mb",
"primary_size_in_bytes" : 706043580,
"size" : "673.3mb",
"size_in_bytes" : 706043580
},
"translog" : {
"operations" : 0
},
"docs" : {
"num_docs" : 1869131,
"max_doc" : 2263281,
"deleted_docs" : 394150
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 1,
"total_time" : "3.7s",
"total_time_in_millis" : 3776,
"total_docs" : 2263281,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 1,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 1,
"total_time" : "18ms",
"total_time_in_millis" : 18
},
"shards" : {
"0" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "vwbSeFjtQ3SqdbJzh8V-uA",
"relocating_node" : null,
"shard" : 0,
"index" : "events"
},
"state" : "STARTED",
"index" : {
"size" : "673.3mb",
"size_in_bytes" : 706043580
},
"translog" : {
"id" : 1361432592857,
"operations" : 0
},
"docs" : {
"num_docs" : 1869131,
"max_doc" : 2263281,
"deleted_docs" : 394150
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 1,
"total_time" : "3.7s",
"total_time_in_millis" : 3776,
"total_docs" : 2263281,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 1,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 1,
"total_time" : "18ms",
"total_time_in_millis" : 18
}
} ]
}
}
}

_segments

{
"ok" : true,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"indices" : {
"events" : {
"shards" : {
"0" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "vwbSeFjtQ3SqdbJzh8V-uA"
},
"num_committed_segments" : 20,
"num_search_segments" : 20,
"segments" : {
"_4qm" : {
"generation" : 6142,
"num_docs" : 6730,
"deleted_docs" : 67313,
"size" : "29.8mb",
"size_in_bytes" : 31342776,
"committed" : true,
"search" : true
},
"_ukk" : {
"generation" : 39620,
"num_docs" : 579895,
"deleted_docs" : 232003,
"size" : "203.9mb",
"size_in_bytes" : 213859775,
"committed" : true,
"search" : true
},
"_1o32" : {
"generation" : 77870,
"num_docs" : 1131181,
"deleted_docs" : 71964,
"size" : "389.9mb",
"size_in_bytes" : 408880404,
"committed" : true,
"search" : true
},
"_1o3e" : {
"generation" : 77882,
"num_docs" : 17179,
"deleted_docs" : 15495,
"size" : "9.9mb",
"size_in_bytes" : 10401244,
"committed" : true,
"search" : true
},
"_1qse" : {
"generation" : 81374,
"num_docs" : 64627,
"deleted_docs" : 1870,
"size" : "16.5mb",
"size_in_bytes" : 17305817,
"committed" : true,
"search" : true
},
"_1qyf" : {
"generation" : 81591,
"num_docs" : 9475,
"deleted_docs" : 2073,
"size" : "3.7mb",
"size_in_bytes" : 3894229,
"committed" : true,
"search" : true
},
"_1r7a" : {
"generation" : 81910,
"num_docs" : 15972,
"deleted_docs" : 2375,
"size" : "5mb",
"size_in_bytes" : 5267040,
"committed" : true,
"search" : true
},
"_1rck" : {
"generation" : 82100,
"num_docs" : 9818,
"deleted_docs" : 6,
"size" : "3.5mb",
"size_in_bytes" : 3753253,
"committed" : true,
"search" : true
},
"_1re0" : {
"generation" : 82152,
"num_docs" : 16651,
"deleted_docs" : 787,
"size" : "4.6mb",
"size_in_bytes" : 4896408,
"committed" : true,
"search" : true
},
"_1rhz" : {
"generation" : 82295,
"num_docs" : 14006,
"deleted_docs" : 264,
"size" : "4.7mb",
"size_in_bytes" : 5016838,
"committed" : true,
"search" : true
},
"_1ri0" : {
"generation" : 82296,
"num_docs" : 479,
"deleted_docs" : 0,
"size" : "162.7kb",
"size_in_bytes" : 166651,
"committed" : true,
"search" : true
},
"_1ri4" : {
"generation" : 82300,
"num_docs" : 266,
"deleted_docs" : 0,
"size" : "110.1kb",
"size_in_bytes" : 112763,
"committed" : true,
"search" : true
},
"_1ri6" : {
"generation" : 82302,
"num_docs" : 250,
"deleted_docs" : 0,
"size" : "109.3kb",
"size_in_bytes" : 111965,
"committed" : true,
"search" : true
},
"_1ri8" : {
"generation" : 82304,
"num_docs" : 153,
"deleted_docs" : 0,
"size" : "85.4kb",
"size_in_bytes" : 87460,
"committed" : true,
"search" : true
},
"_1ria" : {
"generation" : 82306,
"num_docs" : 549,
"deleted_docs" : 0,
"size" : "190.7kb",
"size_in_bytes" : 195340,
"committed" : true,
"search" : true
},
"_1riu" : {
"generation" : 82326,
"num_docs" : 261,
"deleted_docs" : 0,
"size" : "106kb",
"size_in_bytes" : 108604,
"committed" : true,
"search" : true
},
"_1rj6" : {
"generation" : 82338,
"num_docs" : 533,
"deleted_docs" : 0,
"size" : "191.1kb",
"size_in_bytes" : 195748,
"committed" : true,
"search" : true
},
"_1rj8" : {
"generation" : 82340,
"num_docs" : 276,
"deleted_docs" : 0,
"size" : "119.4kb",
"size_in_bytes" : 122299,
"committed" : true,
"search" : true
},
"_1rjb" : {
"generation" : 82343,
"num_docs" : 251,
"deleted_docs" : 0,
"size" : "100.7kb",
"size_in_bytes" : 103129,
"committed" : true,
"search" : true
},
"_1rjs" : {
"generation" : 82360,
"num_docs" : 579,
"deleted_docs" : 0,
"size" : "207.2kb",
"size_in_bytes" : 212180,
"committed" : true,
"search" : true
}
}
} ]
}
}
}
}

On Monday, February 25, 2013 6:22:31 AM UTC-8, simonw wrote:

Hey Taras,

it would be awesome if you could provide some more infos about you
problems so we can fix it quickly

simon

On Monday, February 25, 2013 9:42:20 AM UTC+1, simonw wrote:

hey,

can you give some more background here, like did you start this on an
old index ie. an index you created with a previous version of ES / Lucene?
Can you reproduce this?

simon

On Monday, February 25, 2013 7:54:46 AM UTC+1, Taras Shkvarchuk wrote:

While testing the master build with lucene 4.1, I got a permanent merge
error filling up my logs.
Has anyone else seen it? Anything I can provide to help debug?

[2013-02-24 22:43:38,863][WARN ][index.merge.scheduler ]
[Enchantress] [events][0] failed to merge
java.lang.ArrayIndexOutOfBoundsException
at org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:132)
at
org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$ChunkIterator.decompress(CompressingStoredFieldsReader.java:388)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:368)
at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:283)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3698)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3303)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:90)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)

Thanks!

-Taras

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

one thing that would be interesting is what JVM are you using, is there a
possibility for a JVM bug here?

simon

On Monday, February 25, 2013 7:10:18 PM UTC+1, Taras Shkvarchuk wrote:

Not sure what type of info you're looking for, so you can e-mail me for
more details, I can probably even provide the index in question for
debugging.
There are 420K root level documents as all of which have small nested
documents. Re-indexing nearly the same data resulted in an index without
corruption.

Here is the _status and _segments call.
{
"ok" : true,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"indices" : {
"events" : {
"index" : {
"primary_size" : "673.3mb",
"primary_size_in_bytes" : 706043580,
"size" : "673.3mb",
"size_in_bytes" : 706043580
},
"translog" : {
"operations" : 0
},
"docs" : {
"num_docs" : 1869131,
"max_doc" : 2263281,
"deleted_docs" : 394150
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 1,
"total_time" : "3.7s",
"total_time_in_millis" : 3776,
"total_docs" : 2263281,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 1,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 1,
"total_time" : "18ms",
"total_time_in_millis" : 18
},
"shards" : {
"0" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "vwbSeFjtQ3SqdbJzh8V-uA",
"relocating_node" : null,
"shard" : 0,
"index" : "events"
},
"state" : "STARTED",
"index" : {
"size" : "673.3mb",
"size_in_bytes" : 706043580
},
"translog" : {
"id" : 1361432592857,
"operations" : 0
},
"docs" : {
"num_docs" : 1869131,
"max_doc" : 2263281,
"deleted_docs" : 394150
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 1,
"total_time" : "3.7s",
"total_time_in_millis" : 3776,
"total_docs" : 2263281,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 1,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 1,
"total_time" : "18ms",
"total_time_in_millis" : 18
}
} ]
}
}
}

_segments

{
"ok" : true,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"indices" : {
"events" : {
"shards" : {
"0" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "vwbSeFjtQ3SqdbJzh8V-uA"
},
"num_committed_segments" : 20,
"num_search_segments" : 20,
"segments" : {
"_4qm" : {
"generation" : 6142,
"num_docs" : 6730,
"deleted_docs" : 67313,
"size" : "29.8mb",
"size_in_bytes" : 31342776,
"committed" : true,
"search" : true
},
"_ukk" : {
"generation" : 39620,
"num_docs" : 579895,
"deleted_docs" : 232003,
"size" : "203.9mb",
"size_in_bytes" : 213859775,
"committed" : true,
"search" : true
},
"_1o32" : {
"generation" : 77870,
"num_docs" : 1131181,
"deleted_docs" : 71964,
"size" : "389.9mb",
"size_in_bytes" : 408880404,
"committed" : true,
"search" : true
},
"_1o3e" : {
"generation" : 77882,
"num_docs" : 17179,
"deleted_docs" : 15495,
"size" : "9.9mb",
"size_in_bytes" : 10401244,
"committed" : true,
"search" : true
},
"_1qse" : {
"generation" : 81374,
"num_docs" : 64627,
"deleted_docs" : 1870,
"size" : "16.5mb",
"size_in_bytes" : 17305817,
"committed" : true,
"search" : true
},
"_1qyf" : {
"generation" : 81591,
"num_docs" : 9475,
"deleted_docs" : 2073,
"size" : "3.7mb",
"size_in_bytes" : 3894229,
"committed" : true,
"search" : true
},
"_1r7a" : {
"generation" : 81910,
"num_docs" : 15972,
"deleted_docs" : 2375,
"size" : "5mb",
"size_in_bytes" : 5267040,
"committed" : true,
"search" : true
},
"_1rck" : {
"generation" : 82100,
"num_docs" : 9818,
"deleted_docs" : 6,
"size" : "3.5mb",
"size_in_bytes" : 3753253,
"committed" : true,
"search" : true
},
"_1re0" : {
"generation" : 82152,
"num_docs" : 16651,
"deleted_docs" : 787,
"size" : "4.6mb",
"size_in_bytes" : 4896408,
"committed" : true,
"search" : true
},
"_1rhz" : {
"generation" : 82295,
"num_docs" : 14006,
"deleted_docs" : 264,
"size" : "4.7mb",
"size_in_bytes" : 5016838,
"committed" : true,
"search" : true
},
"_1ri0" : {
"generation" : 82296,
"num_docs" : 479,
"deleted_docs" : 0,
"size" : "162.7kb",
"size_in_bytes" : 166651,
"committed" : true,
"search" : true
},
"_1ri4" : {
"generation" : 82300,
"num_docs" : 266,
"deleted_docs" : 0,
"size" : "110.1kb",
"size_in_bytes" : 112763,
"committed" : true,
"search" : true
},
"_1ri6" : {
"generation" : 82302,
"num_docs" : 250,
"deleted_docs" : 0,
"size" : "109.3kb",
"size_in_bytes" : 111965,
"committed" : true,
"search" : true
},
"_1ri8" : {
"generation" : 82304,
"num_docs" : 153,
"deleted_docs" : 0,
"size" : "85.4kb",
"size_in_bytes" : 87460,
"committed" : true,
"search" : true
},
"_1ria" : {
"generation" : 82306,
"num_docs" : 549,
"deleted_docs" : 0,
"size" : "190.7kb",
"size_in_bytes" : 195340,
"committed" : true,
"search" : true
},
"_1riu" : {
"generation" : 82326,
"num_docs" : 261,
"deleted_docs" : 0,
"size" : "106kb",
"size_in_bytes" : 108604,
"committed" : true,
"search" : true
},
"_1rj6" : {
"generation" : 82338,
"num_docs" : 533,
"deleted_docs" : 0,
"size" : "191.1kb",
"size_in_bytes" : 195748,
"committed" : true,
"search" : true
},
"_1rj8" : {
"generation" : 82340,
"num_docs" : 276,
"deleted_docs" : 0,
"size" : "119.4kb",
"size_in_bytes" : 122299,
"committed" : true,
"search" : true
},
"_1rjb" : {
"generation" : 82343,
"num_docs" : 251,
"deleted_docs" : 0,
"size" : "100.7kb",
"size_in_bytes" : 103129,
"committed" : true,
"search" : true
},
"_1rjs" : {
"generation" : 82360,
"num_docs" : 579,
"deleted_docs" : 0,
"size" : "207.2kb",
"size_in_bytes" : 212180,
"committed" : true,
"search" : true
}
}
} ]
}
}
}
}

On Monday, February 25, 2013 6:22:31 AM UTC-8, simonw wrote:

Hey Taras,

it would be awesome if you could provide some more infos about you
problems so we can fix it quickly

simon

On Monday, February 25, 2013 9:42:20 AM UTC+1, simonw wrote:

hey,

can you give some more background here, like did you start this on an
old index ie. an index you created with a previous version of ES / Lucene?
Can you reproduce this?

simon

On Monday, February 25, 2013 7:54:46 AM UTC+1, Taras Shkvarchuk wrote:

While testing the master build with lucene 4.1, I got a permanent merge
error filling up my logs.
Has anyone else seen it? Anything I can provide to help debug?

[2013-02-24 22:43:38,863][WARN ][index.merge.scheduler ]
[Enchantress] [events][0] failed to merge
java.lang.ArrayIndexOutOfBoundsException
at org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:132)
at
org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$ChunkIterator.decompress(CompressingStoredFieldsReader.java:388)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:368)
at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:283)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3698)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3303)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:90)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)

Thanks!

-Taras

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

we have had the same problem on the weekend.

[2013-07-19 14:53:16,663][WARN ][index.merge.scheduler ] [
kreplytssearch47-1] [replyts][7] failed to merge
java.lang.ArrayIndexOutOfBoundsException: 413166
at org.apache.lucene.codecs.lucene40.BitVector.get(BitVector.java:
146)
at org.apache.lucene.index.MergeState$DocMap$1.get(MergeState.java:
86)
at org.apache.lucene.codecs.MappingMultiDocsAndPositionsEnum.nextDoc
(MappingMultiDocsAndPositionsEnum.java:107)
at org.apache.lucene.codecs.PostingsConsumer.merge(PostingsConsumer.
java:109)
at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:
164)
at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java
:72)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.
java:365)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98
)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:
3709)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3313)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(
ConcurrentMergeScheduler.java:401)
at org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(
TrackingConcurrentMergeScheduler.java:91)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(
ConcurrentMergeScheduler.java:478)
[2013-07-19 14:53:16,731][WARN ][index.engine.robin ] [
kreplytssearch47-1] [replyts][7] failed engine
org.apache.lucene.index.MergePolicy$MergeException: java.lang.
ArrayIndexOutOfBoundsException: 413166
at org.elasticsearch.index.merge.scheduler.
ConcurrentMergeSchedulerProvider$CustomConcurrentMergeScheduler.
handleMergeException(ConcurrentMergeSchedulerProvider.java:100)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(
ConcurrentMergeScheduler.java:514)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 413166
at org.apache.lucene.codecs.lucene40.BitVector.get(BitVector.java:
146)
at org.apache.lucene.index.MergeState$DocMap$1.get(MergeState.java:
86)
at org.apache.lucene.codecs.MappingMultiDocsAndPositionsEnum.nextDoc
(MappingMultiDocsAndPositionsEnum.java:107)
at org.apache.lucene.codecs.PostingsConsumer.merge(PostingsConsumer.
java:109)
at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:
164)
at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java
:72)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.
java:365)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98
)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:
3709)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3313)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(
ConcurrentMergeScheduler.java:401)
at org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(
TrackingConcurrentMergeScheduler.java:91)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(
ConcurrentMergeScheduler.java:478)
[2013-07-19 14:53:16,790][WARN ][cluster.action.shard ] [
kreplytssearch47-1] sending failed shard for [replyts][7], node[
lq6vXWmaSlyNFAagtRu6uQ], [R], s[STARTED], reason [engine failure, message [
MergeException[java.lang.ArrayIndexOutOfBoundsException: 413166]; nested:
ArrayIndexOutOfBoundsException[413166]; ]]

The cluster is actually a 0.90.1 with JVM

java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

the index was previously created with 0.20.2 and then only the software was
upgraded.

I have had to remove the single shard and replaced it with an empty version
to be able to recover.

One important thing to add: after the exception happened the cluster
started to behave bad:

  • node A tried to send the failed shard to node B
  • while recovering on B the recovering failed with this exception
  • B was somehow renaming the broken files
  • A tried to send the failed shard again to node B
  • while recovering it failed again...
  • B renamed the files again...
    ....
    ...

this didn't stop until the disk was full on node B.

Cheers
Simon

PS: Hi Simon.. so "trifft" man sich wieder ;).. hast du noch Kontakt zu den
alten TFH Profs?

On Monday, February 25, 2013 7:34:41 PM UTC+1, simonw wrote:

one thing that would be interesting is what JVM are you using, is there a
possibility for a JVM bug here?

simon

On Monday, February 25, 2013 7:10:18 PM UTC+1, Taras Shkvarchuk wrote:

Not sure what type of info you're looking for, so you can e-mail me for
more details, I can probably even provide the index in question for
debugging.
There are 420K root level documents as all of which have small nested
documents. Re-indexing nearly the same data resulted in an index without
corruption.

Here is the _status and _segments call.
{
"ok" : true,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"indices" : {
"events" : {
"index" : {
"primary_size" : "673.3mb",
"primary_size_in_bytes" : 706043580,
"size" : "673.3mb",
"size_in_bytes" : 706043580
},
"translog" : {
"operations" : 0
},
"docs" : {
"num_docs" : 1869131,
"max_doc" : 2263281,
"deleted_docs" : 394150
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 1,
"total_time" : "3.7s",
"total_time_in_millis" : 3776,
"total_docs" : 2263281,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 1,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 1,
"total_time" : "18ms",
"total_time_in_millis" : 18
},
"shards" : {
"0" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "vwbSeFjtQ3SqdbJzh8V-uA",
"relocating_node" : null,
"shard" : 0,
"index" : "events"
},
"state" : "STARTED",
"index" : {
"size" : "673.3mb",
"size_in_bytes" : 706043580
},
"translog" : {
"id" : 1361432592857,
"operations" : 0
},
"docs" : {
"num_docs" : 1869131,
"max_doc" : 2263281,
"deleted_docs" : 394150
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 1,
"total_time" : "3.7s",
"total_time_in_millis" : 3776,
"total_docs" : 2263281,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 1,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 1,
"total_time" : "18ms",
"total_time_in_millis" : 18
}
} ]
}
}
}

_segments

{
"ok" : true,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"indices" : {
"events" : {
"shards" : {
"0" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "vwbSeFjtQ3SqdbJzh8V-uA"
},
"num_committed_segments" : 20,
"num_search_segments" : 20,
"segments" : {
"_4qm" : {
"generation" : 6142,
"num_docs" : 6730,
"deleted_docs" : 67313,
"size" : "29.8mb",
"size_in_bytes" : 31342776,
"committed" : true,
"search" : true
},
"_ukk" : {
"generation" : 39620,
"num_docs" : 579895,
"deleted_docs" : 232003,
"size" : "203.9mb",
"size_in_bytes" : 213859775,
"committed" : true,
"search" : true
},
"_1o32" : {
"generation" : 77870,
"num_docs" : 1131181,
"deleted_docs" : 71964,
"size" : "389.9mb",
"size_in_bytes" : 408880404,
"committed" : true,
"search" : true
},
"_1o3e" : {
"generation" : 77882,
"num_docs" : 17179,
"deleted_docs" : 15495,
"size" : "9.9mb",
"size_in_bytes" : 10401244,
"committed" : true,
"search" : true
},
"_1qse" : {
"generation" : 81374,
"num_docs" : 64627,
"deleted_docs" : 1870,
"size" : "16.5mb",
"size_in_bytes" : 17305817,
"committed" : true,
"search" : true
},
"_1qyf" : {
"generation" : 81591,
"num_docs" : 9475,
"deleted_docs" : 2073,
"size" : "3.7mb",
"size_in_bytes" : 3894229,
"committed" : true,
"search" : true
},
"_1r7a" : {
"generation" : 81910,
"num_docs" : 15972,
"deleted_docs" : 2375,
"size" : "5mb",
"size_in_bytes" : 5267040,
"committed" : true,
"search" : true
},
"_1rck" : {
"generation" : 82100,
"num_docs" : 9818,
"deleted_docs" : 6,
"size" : "3.5mb",
"size_in_bytes" : 3753253,
"committed" : true,
"search" : true
},
"_1re0" : {
"generation" : 82152,
"num_docs" : 16651,
"deleted_docs" : 787,
"size" : "4.6mb",
"size_in_bytes" : 4896408,
"committed" : true,
"search" : true
},
"_1rhz" : {
"generation" : 82295,
"num_docs" : 14006,
"deleted_docs" : 264,
"size" : "4.7mb",
"size_in_bytes" : 5016838,
"committed" : true,
"search" : true
},
"_1ri0" : {
"generation" : 82296,
"num_docs" : 479,
"deleted_docs" : 0,
"size" : "162.7kb",
"size_in_bytes" : 166651,
"committed" : true,
"search" : true
},
"_1ri4" : {
"generation" : 82300,
"num_docs" : 266,
"deleted_docs" : 0,
"size" : "110.1kb",
"size_in_bytes" : 112763,
"committed" : true,
"search" : true
},
"_1ri6" : {
"generation" : 82302,
"num_docs" : 250,
"deleted_docs" : 0,
"size" : "109.3kb",
"size_in_bytes" : 111965,
"committed" : true,
"search" : true
},
"_1ri8" : {
"generation" : 82304,
"num_docs" : 153,
"deleted_docs" : 0,
"size" : "85.4kb",
"size_in_bytes" : 87460,
"committed" : true,
"search" : true
},
"_1ria" : {
"generation" : 82306,
"num_docs" : 549,
"deleted_docs" : 0,
"size" : "190.7kb",
"size_in_bytes" : 195340,
"committed" : true,
"search" : true
},
"_1riu" : {
"generation" : 82326,
"num_docs" : 261,
"deleted_docs" : 0,
"size" : "106kb",
"size_in_bytes" : 108604,
"committed" : true,
"search" : true
},
"_1rj6" : {
"generation" : 82338,
"num_docs" : 533,
"deleted_docs" : 0,
"size" : "191.1kb",
"size_in_bytes" : 195748,
"committed" : true,
"search" : true
},
"_1rj8" : {
"generation" : 82340,
"num_docs" : 276,
"deleted_docs" : 0,
"size" : "119.4kb",
"size_in_bytes" : 122299,
"committed" : true,
"search" : true
},
"_1rjb" : {
"generation" : 82343,
"num_docs" : 251,
"deleted_docs" : 0,
"size" : "100.7kb",
"size_in_bytes" : 103129,
"committed" : true,
"search" : true
},
"_1rjs" : {
"generation" : 82360,
"num_docs" : 579,
"deleted_docs" : 0,
"size" : "207.2kb",
"size_in_bytes" : 212180,
"committed" : true,
"search" : true
}
}
} ]
}
}
}
}

On Monday, February 25, 2013 6:22:31 AM UTC-8, simonw wrote:

Hey Taras,

it would be awesome if you could provide some more infos about you
problems so we can fix it quickly

simon

On Monday, February 25, 2013 9:42:20 AM UTC+1, simonw wrote:

hey,

can you give some more background here, like did you start this on an
old index ie. an index you created with a previous version of ES / Lucene?
Can you reproduce this?

simon

On Monday, February 25, 2013 7:54:46 AM UTC+1, Taras Shkvarchuk wrote:

While testing the master build with lucene 4.1, I got a permanent
merge error filling up my logs.
Has anyone else seen it? Anything I can provide to help debug?

[2013-02-24 22:43:38,863][WARN ][index.merge.scheduler ]
[Enchantress] [events][0] failed to merge
java.lang.ArrayIndexOutOfBoundsException
at org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:132)
at
org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$ChunkIterator.decompress(CompressingStoredFieldsReader.java:388)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:368)
at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:283)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3698)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3303)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:90)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)

Thanks!

-Taras

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

hey simon!

good to meet you again, long time no see! Is there any chance we can get a
hold on this corrupted shard? It would be awesome to get this and reproduce
the problem somehow. You can contact me on
simon.willnauer@elasticsearch.com directly though!

R u still around in Berlin, wanna meet for a coffee at some point?

simon

On Wednesday, July 24, 2013 8:47:14 AM UTC+2, Simon Effenberg wrote:

Hi,

we have had the same problem on the weekend.

[2013-07-19 14:53:16,663][WARN ][index.merge.scheduler ] [
kreplytssearch47-1] [replyts][7] failed to merge
java.lang.ArrayIndexOutOfBoundsException: 413166
at org.apache.lucene.codecs.lucene40.BitVector.get(BitVector.java:
146)
at org.apache.lucene.index.MergeState$DocMap$1.get(MergeState.java
:86)
at org.apache.lucene.codecs.MappingMultiDocsAndPositionsEnum.
nextDoc(MappingMultiDocsAndPositionsEnum.java:107)
at org.apache.lucene.codecs.PostingsConsumer.merge(
PostingsConsumer.java:109)
at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java
:164)
at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.
java:72)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.
java:365)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:
98)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.
java:3709)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3313
)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(
ConcurrentMergeScheduler.java:401)
at org.apache.lucene.index.TrackingConcurrentMergeScheduler.
doMerge(TrackingConcurrentMergeScheduler.java:91)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.
run(ConcurrentMergeScheduler.java:478)
[2013-07-19 14:53:16,731][WARN ][index.engine.robin ] [
kreplytssearch47-1] [replyts][7] failed engine
org.apache.lucene.index.MergePolicy$MergeException: java.lang.
ArrayIndexOutOfBoundsException: 413166
at org.elasticsearch.index.merge.scheduler.
ConcurrentMergeSchedulerProvider$CustomConcurrentMergeScheduler.
handleMergeException(ConcurrentMergeSchedulerProvider.java:100)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.
run(ConcurrentMergeScheduler.java:514)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 413166
at org.apache.lucene.codecs.lucene40.BitVector.get(BitVector.java:
146)
at org.apache.lucene.index.MergeState$DocMap$1.get(MergeState.java
:86)
at org.apache.lucene.codecs.MappingMultiDocsAndPositionsEnum.
nextDoc(MappingMultiDocsAndPositionsEnum.java:107)
at org.apache.lucene.codecs.PostingsConsumer.merge(
PostingsConsumer.java:109)
at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java
:164)
at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.
java:72)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.
java:365)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:
98)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.
java:3709)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3313
)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(
ConcurrentMergeScheduler.java:401)
at org.apache.lucene.index.TrackingConcurrentMergeScheduler.
doMerge(TrackingConcurrentMergeScheduler.java:91)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.
run(ConcurrentMergeScheduler.java:478)
[2013-07-19 14:53:16,790][WARN ][cluster.action.shard ] [
kreplytssearch47-1] sending failed shard for [replyts][7], node[
lq6vXWmaSlyNFAagtRu6uQ], [R], s[STARTED], reason [engine failure, message
[MergeException[java.lang.ArrayIndexOutOfBoundsException: 413166]; nested:
ArrayIndexOutOfBoundsException[413166]; ]]

The cluster is actually a 0.90.1 with JVM

java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

the index was previously created with 0.20.2 and then only the software
was upgraded.

I have had to remove the single shard and replaced it with an empty
version to be able to recover.

One important thing to add: after the exception happened the cluster
started to behave bad:

  • node A tried to send the failed shard to node B
  • while recovering on B the recovering failed with this exception
  • B was somehow renaming the broken files
  • A tried to send the failed shard again to node B
  • while recovering it failed again...
  • B renamed the files again...
    ....
    ...

this didn't stop until the disk was full on node B.

Cheers
Simon

PS: Hi Simon.. so "trifft" man sich wieder ;).. hast du noch Kontakt zu
den alten TFH Profs?

On Monday, February 25, 2013 7:34:41 PM UTC+1, simonw wrote:

one thing that would be interesting is what JVM are you using, is there a
possibility for a JVM bug here?

simon

On Monday, February 25, 2013 7:10:18 PM UTC+1, Taras Shkvarchuk wrote:

Not sure what type of info you're looking for, so you can e-mail me for
more details, I can probably even provide the index in question for
debugging.
There are 420K root level documents as all of which have small nested
documents. Re-indexing nearly the same data resulted in an index without
corruption.

Here is the _status and _segments call.
{
"ok" : true,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"indices" : {
"events" : {
"index" : {
"primary_size" : "673.3mb",
"primary_size_in_bytes" : 706043580,
"size" : "673.3mb",
"size_in_bytes" : 706043580
},
"translog" : {
"operations" : 0
},
"docs" : {
"num_docs" : 1869131,
"max_doc" : 2263281,
"deleted_docs" : 394150
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 1,
"total_time" : "3.7s",
"total_time_in_millis" : 3776,
"total_docs" : 2263281,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 1,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 1,
"total_time" : "18ms",
"total_time_in_millis" : 18
},
"shards" : {
"0" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "vwbSeFjtQ3SqdbJzh8V-uA",
"relocating_node" : null,
"shard" : 0,
"index" : "events"
},
"state" : "STARTED",
"index" : {
"size" : "673.3mb",
"size_in_bytes" : 706043580
},
"translog" : {
"id" : 1361432592857,
"operations" : 0
},
"docs" : {
"num_docs" : 1869131,
"max_doc" : 2263281,
"deleted_docs" : 394150
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 1,
"total_time" : "3.7s",
"total_time_in_millis" : 3776,
"total_docs" : 2263281,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 1,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 1,
"total_time" : "18ms",
"total_time_in_millis" : 18
}
} ]
}
}
}

_segments

{
"ok" : true,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"indices" : {
"events" : {
"shards" : {
"0" : [ {
"routing" : {
"state" : "STARTED",
"primary" : true,
"node" : "vwbSeFjtQ3SqdbJzh8V-uA"
},
"num_committed_segments" : 20,
"num_search_segments" : 20,
"segments" : {
"_4qm" : {
"generation" : 6142,
"num_docs" : 6730,
"deleted_docs" : 67313,
"size" : "29.8mb",
"size_in_bytes" : 31342776,
"committed" : true,
"search" : true
},
"_ukk" : {
"generation" : 39620,
"num_docs" : 579895,
"deleted_docs" : 232003,
"size" : "203.9mb",
"size_in_bytes" : 213859775,
"committed" : true,
"search" : true
},
"_1o32" : {
"generation" : 77870,
"num_docs" : 1131181,
"deleted_docs" : 71964,
"size" : "389.9mb",
"size_in_bytes" : 408880404,
"committed" : true,
"search" : true
},
"_1o3e" : {
"generation" : 77882,
"num_docs" : 17179,
"deleted_docs" : 15495,
"size" : "9.9mb",
"size_in_bytes" : 10401244,
"committed" : true,
"search" : true
},
"_1qse" : {
"generation" : 81374,
"num_docs" : 64627,
"deleted_docs" : 1870,
"size" : "16.5mb",
"size_in_bytes" : 17305817,
"committed" : true,
"search" : true
},
"_1qyf" : {
"generation" : 81591,
"num_docs" : 9475,
"deleted_docs" : 2073,
"size" : "3.7mb",
"size_in_bytes" : 3894229,
"committed" : true,
"search" : true
},
"_1r7a" : {
"generation" : 81910,
"num_docs" : 15972,
"deleted_docs" : 2375,
"size" : "5mb",
"size_in_bytes" : 5267040,
"committed" : true,
"search" : true
},
"_1rck" : {
"generation" : 82100,
"num_docs" : 9818,
"deleted_docs" : 6,
"size" : "3.5mb",
"size_in_bytes" : 3753253,
"committed" : true,
"search" : true
},
"_1re0" : {
"generation" : 82152,
"num_docs" : 16651,
"deleted_docs" : 787,
"size" : "4.6mb",
"size_in_bytes" : 4896408,
"committed" : true,
"search" : true
},
"_1rhz" : {
"generation" : 82295,
"num_docs" : 14006,
"deleted_docs" : 264,
"size" : "4.7mb",
"size_in_bytes" : 5016838,
"committed" : true,
"search" : true
},
"_1ri0" : {
"generation" : 82296,
"num_docs" : 479,
"deleted_docs" : 0,
"size" : "162.7kb",
"size_in_bytes" : 166651,
"committed" : true,
"search" : true
},
"_1ri4" : {
"generation" : 82300,
"num_docs" : 266,
"deleted_docs" : 0,
"size" : "110.1kb",
"size_in_bytes" : 112763,
"committed" : true,
"search" : true
},
"_1ri6" : {
"generation" : 82302,
"num_docs" : 250,
"deleted_docs" : 0,
"size" : "109.3kb",
"size_in_bytes" : 111965,
"committed" : true,
"search" : true
},
"_1ri8" : {
"generation" : 82304,
"num_docs" : 153,
"deleted_docs" : 0,
"size" : "85.4kb",
"size_in_bytes" : 87460,
"committed" : true,
"search" : true
},
"_1ria" : {
"generation" : 82306,
"num_docs" : 549,
"deleted_docs" : 0,
"size" : "190.7kb",
"size_in_bytes" : 195340,
"committed" : true,
"search" : true
},
"_1riu" : {
"generation" : 82326,
"num_docs" : 261,
"deleted_docs" : 0,
"size" : "106kb",
"size_in_bytes" : 108604,
"committed" : true,
"search" : true
},
"_1rj6" : {
"generation" : 82338,
"num_docs" : 533,
"deleted_docs" : 0,
"size" : "191.1kb",
"size_in_bytes" : 195748,
"committed" : true,
"search" : true
},
"_1rj8" : {
"generation" : 82340,
"num_docs" : 276,
"deleted_docs" : 0,
"size" : "119.4kb",
"size_in_bytes" : 122299,
"committed" : true,
"search" : true
},
"_1rjb" : {
"generation" : 82343,
"num_docs" : 251,
"deleted_docs" : 0,
"size" : "100.7kb",
"size_in_bytes" : 103129,
"committed" : true,
"search" : true
},
"_1rjs" : {
"generation" : 82360,
"num_docs" : 579,
"deleted_docs" : 0,
"size" : "207.2kb",
"size_in_bytes" : 212180,
"committed" : true,
"search" : true
}
}
} ]
}
}
}
}

On Monday, February 25, 2013 6:22:31 AM UTC-8, simonw wrote:

Hey Taras,

it would be awesome if you could provide some more infos about you
problems so we can fix it quickly

simon

On Monday, February 25, 2013 9:42:20 AM UTC+1, simonw wrote:

hey,

can you give some more background here, like did you start this on an
old index ie. an index you created with a previous version of ES / Lucene?
Can you reproduce this?

simon

On Monday, February 25, 2013 7:54:46 AM UTC+1, Taras Shkvarchuk wrote:

While testing the master build with lucene 4.1, I got a permanent
merge error filling up my logs.
Has anyone else seen it? Anything I can provide to help debug?

[2013-02-24 22:43:38,863][WARN ][index.merge.scheduler ]
[Enchantress] [events][0] failed to merge
java.lang.ArrayIndexOutOfBoundsException
at org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:132)
at
org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$ChunkIterator.decompress(CompressingStoredFieldsReader.java:388)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:368)
at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:283)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3698)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3303)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:90)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478)

Thanks!

-Taras

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.