Index "failed to merge"


(Kenneth Loafman) #1

Hi,

We're got the following problem on the 15th:
[2011-10-15 16:10:19,023][WARN ][index.merge.scheduler ] [Rand, Daniel]
[co0181ca0607][0] failed to merge
org.apache.lucene.index.CorruptIndexException: docs out of order (221 <= 221
)
at
org.apache.lucene.index.FormatPostingsDocsWriter.addDoc(FormatPostingsDocsWriter.java:84)
at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:590)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:538)
at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:470)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:109)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4273)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3917)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:88)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)

After that, it's just a one line warning message ever so often.

We have 2 shards and one replica, so the data should be safe, right?

How do I fix this?

...Thanks,
...Ken


(Shay Banon) #2

You can restart the node that causes this problem, and have the shards on it
resync. Though it should not happen, any other failures before it happened?
Can you gist the logs for it from around the time it just started?

On Sat, Oct 22, 2011 at 9:39 PM, Kenneth Loafman
kenneth.loafman@gmail.comwrote:

Hi,

We're got the following problem on the 15th:
[2011-10-15 16:10:19,023][WARN ][index.merge.scheduler ] [Rand, Daniel]
[co0181ca0607][0] failed to merge
org.apache.lucene.index.CorruptIndexException: docs out of order (221 <=
221 )
at
org.apache.lucene.index.FormatPostingsDocsWriter.addDoc(FormatPostingsDocsWriter.java:84)
at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:590)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:538)
at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:470)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:109)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4273)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3917)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:88)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)

After that, it's just a one line warning message ever so often.

We have 2 shards and one replica, so the data should be safe, right?

How do I fix this?

...Thanks,
...Ken


(Kenneth Loafman-2) #3

Looks like a restart caused the original problem:

I tried the restart node and no go:

On Sat, Oct 22, 2011 at 7:26 PM, Shay Banon kimchy@gmail.com wrote:

You can restart the node that causes this problem, and have the shards on
it resync. Though it should not happen, any other failures before it
happened? Can you gist the logs for it from around the time it just started?

On Sat, Oct 22, 2011 at 9:39 PM, Kenneth Loafman <
kenneth.loafman@gmail.com> wrote:

Hi,

We're got the following problem on the 15th:
[2011-10-15 16:10:19,023][WARN ][index.merge.scheduler ] [Rand, Daniel]
[co0181ca0607][0] failed to merge
org.apache.lucene.index.CorruptIndexException: docs out of order (221 <=
221 )
at
org.apache.lucene.index.FormatPostingsDocsWriter.addDoc(FormatPostingsDocsWriter.java:84)
at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:590)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:538)
at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:470)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:109)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4273)
at
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3917)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:88)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)

After that, it's just a one line warning message ever so often.

We have 2 shards and one replica, so the data should be safe, right?

How do I fix this?

...Thanks,
...Ken


(Shay Banon) #4

I will look at the initial failure that might have caused it, but for now,
can you stop the node, delete the co0181ca0607][0] shard location (under
data/nodes/0/indices/co0181ca0607/0, and then start it back up? It will
force the shard to be copied from the other node.

On Sun, Oct 23, 2011 at 2:41 PM, Kenneth Loafman kenneth@loafman.comwrote:

Looks like a restart caused the original problem:
https://gist.github.com/1307321#file_original_fail.txt

I tried the restart node and no go:
https://gist.github.com/1307321#file_restart_fail.txt

On Sat, Oct 22, 2011 at 7:26 PM, Shay Banon kimchy@gmail.com wrote:

You can restart the node that causes this problem, and have the shards on
it resync. Though it should not happen, any other failures before it
happened? Can you gist the logs for it from around the time it just started?

On Sat, Oct 22, 2011 at 9:39 PM, Kenneth Loafman <
kenneth.loafman@gmail.com> wrote:

Hi,

We're got the following problem on the 15th:
[2011-10-15 16:10:19,023][WARN ][index.merge.scheduler ] [Rand,
Daniel] [co0181ca0607][0] failed to merge
org.apache.lucene.index.CorruptIndexException: docs out of order (221 <=
221 )
at
org.apache.lucene.index.FormatPostingsDocsWriter.addDoc(FormatPostingsDocsWriter.java:84)
at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:590)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:538)
at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:470)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:109)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4273)
at
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3917)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:88)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)

After that, it's just a one line warning message ever so often.

We have 2 shards and one replica, so the data should be safe, right?

How do I fix this?

...Thanks,
...Ken


(Kenneth Loafman-2) #5

Thanks! That worked.

On Sun, Oct 23, 2011 at 3:17 PM, Shay Banon kimchy@gmail.com wrote:

I will look at the initial failure that might have caused it, but for now,
can you stop the node, delete the co0181ca0607][0] shard location (under
data/nodes/0/indices/co0181ca0607/0, and then start it back up? It will
force the shard to be copied from the other node.

On Sun, Oct 23, 2011 at 2:41 PM, Kenneth Loafman kenneth@loafman.comwrote:

Looks like a restart caused the original problem:
https://gist.github.com/1307321#file_original_fail.txt

I tried the restart node and no go:
https://gist.github.com/1307321#file_restart_fail.txt

On Sat, Oct 22, 2011 at 7:26 PM, Shay Banon kimchy@gmail.com wrote:

You can restart the node that causes this problem, and have the shards on
it resync. Though it should not happen, any other failures before it
happened? Can you gist the logs for it from around the time it just started?

On Sat, Oct 22, 2011 at 9:39 PM, Kenneth Loafman <
kenneth.loafman@gmail.com> wrote:

Hi,

We're got the following problem on the 15th:
[2011-10-15 16:10:19,023][WARN ][index.merge.scheduler ] [Rand,
Daniel] [co0181ca0607][0] failed to merge
org.apache.lucene.index.CorruptIndexException: docs out of order (221 <=
221 )
at
org.apache.lucene.index.FormatPostingsDocsWriter.addDoc(FormatPostingsDocsWriter.java:84)
at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:590)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:538)
at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:470)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:109)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4273)
at
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3917)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:88)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)

After that, it's just a one line warning message ever so often.

We have 2 shards and one replica, so the data should be safe, right?

How do I fix this?

...Thanks,
...Ken


(system) #6