Index "failed to merge"

Kenneth_Loafman · October 22, 2011, 7:39pm

Hi,

We're got the following problem on the 15th:
[2011-10-15 16:10:19,023][WARN ][index.merge.scheduler ] [Rand, Daniel]
[co0181ca0607][0] failed to merge
org.apache.lucene.index.CorruptIndexException: docs out of order (221 <= 221
)
at
org.apache.lucene.index.FormatPostingsDocsWriter.addDoc(FormatPostingsDocsWriter.java:84)
at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:590)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:538)
at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:470)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:109)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4273)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3917)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:88)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)

After that, it's just a one line warning message ever so often.

We have 2 shards and one replica, so the data should be safe, right?

How do I fix this?

...Thanks,
...Ken

kimchy · October 23, 2011, 12:26am

You can restart the node that causes this problem, and have the shards on it
resync. Though it should not happen, any other failures before it happened?
Can you gist the logs for it from around the time it just started?

On Sat, Oct 22, 2011 at 9:39 PM, Kenneth Loafman
kenneth.loafman@gmail.comwrote:

Hi,

We're got the following problem on the 15th:
[2011-10-15 16:10:19,023][WARN ][index.merge.scheduler ] [Rand, Daniel]
[co0181ca0607][0] failed to merge
org.apache.lucene.index.CorruptIndexException: docs out of order (221 <=
221 )
at
org.apache.lucene.index.FormatPostingsDocsWriter.addDoc(FormatPostingsDocsWriter.java:84)
at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:590)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:538)
at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:470)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:109)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4273)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3917)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:88)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)

After that, it's just a one line warning message ever so often.

We have 2 shards and one replica, so the data should be safe, right?

How do I fix this?

...Thanks,
...Ken

Kenneth_Loafman_2 · October 23, 2011, 12:41pm

Looks like a restart caused the original problem:

gist.github.com

https://gist.github.com/kwloafman/1307321

original-fail.txt

[2011-10-15 15:31:21,943][INFO ][node                     ] [Ghost Rider] {0.18.0-SNAPSHOT}[22364]: stopping ...
[2011-10-15 15:31:22,303][INFO ][node                     ] [Ghost Rider] {0.18.0-SNAPSHOT}[22364]: stopped
[2011-10-15 15:31:22,303][INFO ][node                     ] [Ghost Rider] {0.18.0-SNAPSHOT}[22364]: closing ...
[2011-10-15 15:31:22,753][INFO ][node                     ] [Ghost Rider] {0.18.0-SNAPSHOT}[22364]: closed
[2011-10-15 15:31:24,433][INFO ][node                     ] [Rand, Daniel] {0.18.0-SNAPSHOT}[25549]: initializing ...
[2011-10-15 15:31:24,443][INFO ][plugins                  ] [Rand, Daniel] loaded [], sites []
[2011-10-15 15:31:26,933][INFO ][node                     ] [Rand, Daniel] {0.18.0-SNAPSHOT}[25549]: initialized
[2011-10-15 15:31:26,933][INFO ][node                     ] [Rand, Daniel] {0.18.0-SNAPSHOT}[25549]: starting ...
[2011-10-15 15:31:27,013][INFO ][transport                ] [Rand, Daniel] bound_address {inet[/10.177.163.200:9300]}, publish_address {inet[/10.177.163.200:9300]}
[2011-10-15 15:31:30,193][INFO ][cluster.service          ] [Rand, Daniel] detected_master [Berzerker][_fA8lEb9RFOKsNgtXf2dKg][inet[/10.177.163.46:9300]], added {[Berzerker][_fA8lEb9RFOKsNgtXf2dKg][inet[/10.177.163.46:9300]],[Misty Knight][uS7LaaeaQCKMcGd7OqH9EA][inet[/10.177.166.64:9300]],}, reason: zen-disco-receive(from master [[Berzerker][_fA8lEb9RFOKsNgtXf2dKg][inet[/10.177.163.46:9300]]])

This file has been truncated. show original

restart-fail.txt

[2011-10-23 12:00:55,796][INFO ][node                     ] [Dragonfly] {0.18.0-SNAPSHOT}[19699]: stopping ...
[2011-10-23 12:00:56,246][INFO ][node                     ] [Dragonfly] {0.18.0-SNAPSHOT}[19699]: stopped
[2011-10-23 12:00:56,246][INFO ][node                     ] [Dragonfly] {0.18.0-SNAPSHOT}[19699]: closing ...
[2011-10-23 12:00:56,496][INFO ][node                     ] [Dragonfly] {0.18.0-SNAPSHOT}[19699]: closed
[2011-10-23 12:00:59,026][INFO ][node                     ] [Thing] {0.18.0-SNAPSHOT}[13232]: initializing ...
[2011-10-23 12:00:59,026][INFO ][plugins                  ] [Thing] loaded [], sites []
[2011-10-23 12:01:01,466][INFO ][node                     ] [Thing] {0.18.0-SNAPSHOT}[13232]: initialized
[2011-10-23 12:01:01,466][INFO ][node                     ] [Thing] {0.18.0-SNAPSHOT}[13232]: starting ...
[2011-10-23 12:01:01,546][INFO ][transport                ] [Thing] bound_address {inet[/10.177.163.200:9300]}, publish_address {inet[/10.177.163.200:9300]}
[2011-10-23 12:01:04,726][INFO ][cluster.service          ] [Thing] detected_master [Alhazred, Abdul][Yej5tjhGQ-KiQsLY8i1ATg][inet[/10.177.163.46:9300]], added {[Alhazred, Abdul][Yej5tjhGQ-KiQsLY8i1ATg][inet[/10.177.163.46:9300]],[Madame Menace][bbuEjFaBRCGzKuXYvlZO6w][inet[/10.177.166.64:9300]],}, reason: zen-disco-receive(from master [[Alhazred, Abdul][Yej5tjhGQ-KiQsLY8i1ATg][inet[/10.177.163.46:9300]]])

This file has been truncated. show original

I tried the restart node and no go:

gist.github.com

https://gist.github.com/kwloafman/1307321

original-fail.txt

[2011-10-15 15:31:21,943][INFO ][node                     ] [Ghost Rider] {0.18.0-SNAPSHOT}[22364]: stopping ...
[2011-10-15 15:31:22,303][INFO ][node                     ] [Ghost Rider] {0.18.0-SNAPSHOT}[22364]: stopped
[2011-10-15 15:31:22,303][INFO ][node                     ] [Ghost Rider] {0.18.0-SNAPSHOT}[22364]: closing ...
[2011-10-15 15:31:22,753][INFO ][node                     ] [Ghost Rider] {0.18.0-SNAPSHOT}[22364]: closed
[2011-10-15 15:31:24,433][INFO ][node                     ] [Rand, Daniel] {0.18.0-SNAPSHOT}[25549]: initializing ...
[2011-10-15 15:31:24,443][INFO ][plugins                  ] [Rand, Daniel] loaded [], sites []
[2011-10-15 15:31:26,933][INFO ][node                     ] [Rand, Daniel] {0.18.0-SNAPSHOT}[25549]: initialized
[2011-10-15 15:31:26,933][INFO ][node                     ] [Rand, Daniel] {0.18.0-SNAPSHOT}[25549]: starting ...
[2011-10-15 15:31:27,013][INFO ][transport                ] [Rand, Daniel] bound_address {inet[/10.177.163.200:9300]}, publish_address {inet[/10.177.163.200:9300]}
[2011-10-15 15:31:30,193][INFO ][cluster.service          ] [Rand, Daniel] detected_master [Berzerker][_fA8lEb9RFOKsNgtXf2dKg][inet[/10.177.163.46:9300]], added {[Berzerker][_fA8lEb9RFOKsNgtXf2dKg][inet[/10.177.163.46:9300]],[Misty Knight][uS7LaaeaQCKMcGd7OqH9EA][inet[/10.177.166.64:9300]],}, reason: zen-disco-receive(from master [[Berzerker][_fA8lEb9RFOKsNgtXf2dKg][inet[/10.177.163.46:9300]]])

This file has been truncated. show original

restart-fail.txt

[2011-10-23 12:00:55,796][INFO ][node                     ] [Dragonfly] {0.18.0-SNAPSHOT}[19699]: stopping ...
[2011-10-23 12:00:56,246][INFO ][node                     ] [Dragonfly] {0.18.0-SNAPSHOT}[19699]: stopped
[2011-10-23 12:00:56,246][INFO ][node                     ] [Dragonfly] {0.18.0-SNAPSHOT}[19699]: closing ...
[2011-10-23 12:00:56,496][INFO ][node                     ] [Dragonfly] {0.18.0-SNAPSHOT}[19699]: closed
[2011-10-23 12:00:59,026][INFO ][node                     ] [Thing] {0.18.0-SNAPSHOT}[13232]: initializing ...
[2011-10-23 12:00:59,026][INFO ][plugins                  ] [Thing] loaded [], sites []
[2011-10-23 12:01:01,466][INFO ][node                     ] [Thing] {0.18.0-SNAPSHOT}[13232]: initialized
[2011-10-23 12:01:01,466][INFO ][node                     ] [Thing] {0.18.0-SNAPSHOT}[13232]: starting ...
[2011-10-23 12:01:01,546][INFO ][transport                ] [Thing] bound_address {inet[/10.177.163.200:9300]}, publish_address {inet[/10.177.163.200:9300]}
[2011-10-23 12:01:04,726][INFO ][cluster.service          ] [Thing] detected_master [Alhazred, Abdul][Yej5tjhGQ-KiQsLY8i1ATg][inet[/10.177.163.46:9300]], added {[Alhazred, Abdul][Yej5tjhGQ-KiQsLY8i1ATg][inet[/10.177.163.46:9300]],[Madame Menace][bbuEjFaBRCGzKuXYvlZO6w][inet[/10.177.166.64:9300]],}, reason: zen-disco-receive(from master [[Alhazred, Abdul][Yej5tjhGQ-KiQsLY8i1ATg][inet[/10.177.163.46:9300]]])

This file has been truncated. show original

On Sat, Oct 22, 2011 at 7:26 PM, Shay Banon kimchy@gmail.com wrote:

You can restart the node that causes this problem, and have the shards on
it resync. Though it should not happen, any other failures before it
happened? Can you gist the logs for it from around the time it just started?

On Sat, Oct 22, 2011 at 9:39 PM, Kenneth Loafman <
kenneth.loafman@gmail.com> wrote:

Hi,

We're got the following problem on the 15th:
[2011-10-15 16:10:19,023][WARN ][index.merge.scheduler ] [Rand, Daniel]
[co0181ca0607][0] failed to merge
org.apache.lucene.index.CorruptIndexException: docs out of order (221 <=
221 )
at
org.apache.lucene.index.FormatPostingsDocsWriter.addDoc(FormatPostingsDocsWriter.java:84)
at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:590)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:538)
at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:470)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:109)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4273)
at
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3917)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:88)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)

After that, it's just a one line warning message ever so often.

We have 2 shards and one replica, so the data should be safe, right?

How do I fix this?

...Thanks,
...Ken

kimchy · October 23, 2011, 8:17pm

I will look at the initial failure that might have caused it, but for now,
can you stop the node, delete the co0181ca0607][0] shard location (under
data/nodes/0/indices/co0181ca0607/0, and then start it back up? It will
force the shard to be copied from the other node.

On Sun, Oct 23, 2011 at 2:41 PM, Kenneth Loafman kenneth@loafman.comwrote:

Looks like a restart caused the original problem:
Index "failed to merge" · GitHub

I tried the restart node and no go:
Index "failed to merge" · GitHub

On Sat, Oct 22, 2011 at 7:26 PM, Shay Banon kimchy@gmail.com wrote:

You can restart the node that causes this problem, and have the shards on
it resync. Though it should not happen, any other failures before it
happened? Can you gist the logs for it from around the time it just started?

On Sat, Oct 22, 2011 at 9:39 PM, Kenneth Loafman <
kenneth.loafman@gmail.com> wrote:

Hi,

We're got the following problem on the 15th:
[2011-10-15 16:10:19,023][WARN ][index.merge.scheduler ] [Rand,
Daniel] [co0181ca0607][0] failed to merge
org.apache.lucene.index.CorruptIndexException: docs out of order (221 <=
221 )
at
org.apache.lucene.index.FormatPostingsDocsWriter.addDoc(FormatPostingsDocsWriter.java:84)
at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:590)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:538)
at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:470)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:109)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4273)
at
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3917)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:88)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)

After that, it's just a one line warning message ever so often.

We have 2 shards and one replica, so the data should be safe, right?

How do I fix this?

...Thanks,
...Ken

Kenneth_Loafman_2 · October 24, 2011, 11:30am

Thanks! That worked.

On Sun, Oct 23, 2011 at 3:17 PM, Shay Banon kimchy@gmail.com wrote:

I will look at the initial failure that might have caused it, but for now,
can you stop the node, delete the co0181ca0607][0] shard location (under
data/nodes/0/indices/co0181ca0607/0, and then start it back up? It will
force the shard to be copied from the other node.

On Sun, Oct 23, 2011 at 2:41 PM, Kenneth Loafman kenneth@loafman.comwrote:

Looks like a restart caused the original problem:
Index "failed to merge" · GitHub

I tried the restart node and no go:
Index "failed to merge" · GitHub

On Sat, Oct 22, 2011 at 7:26 PM, Shay Banon kimchy@gmail.com wrote:

You can restart the node that causes this problem, and have the shards on
it resync. Though it should not happen, any other failures before it
happened? Can you gist the logs for it from around the time it just started?

On Sat, Oct 22, 2011 at 9:39 PM, Kenneth Loafman <
kenneth.loafman@gmail.com> wrote:

Hi,

We're got the following problem on the 15th:
[2011-10-15 16:10:19,023][WARN ][index.merge.scheduler ] [Rand,
Daniel] [co0181ca0607][0] failed to merge
org.apache.lucene.index.CorruptIndexException: docs out of order (221 <=
221 )
at
org.apache.lucene.index.FormatPostingsDocsWriter.addDoc(FormatPostingsDocsWriter.java:84)
at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:590)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:538)
at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:470)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:109)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4273)
at
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3917)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:88)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)

After that, it's just a one line warning message ever so often.

We have 2 shards and one replica, so the data should be safe, right?

How do I fix this?

...Thanks,
...Ken

Topic		Replies	Views
Index "failed to merge" Elasticsearch	0	312	October 22, 2011
Index Failed to Merge Elasticsearch	1	543	September 13, 2012
Index status red with reason failed engine (reason: [merge failed]) Elasticsearch	7	859	June 19, 2023
0.20.5 -> 0.90.5 data migration causes: failed to merge -> failed engine -> docs out of order Elasticsearch	0	396	October 31, 2013
Corrupted merge when migrating from 0.20.5 to 0.90.5 Elasticsearch	0	346	November 1, 2013

Index "failed to merge"

Related topics