Greetings,
I have a complex integration test that is failing systematically since we upgraded the Elasticsearch cluster to 6.3.0 (Lucene 7.3.1).
The exact same test using an Elasticsearch cluster in version 6.2.4 (Lucene 7.2.1) is successful.
Basically the test is submitting concurrent indexing and bulk indexing requests, at some point a merge exception is raised, this ends up by an Elasticsearch shard failure (cluster health status is
RED).
The Elasticsearch and OS logs look normal until the merge exception below.
So far I was not able to reproduce the problem outside of this dockerized test environment which is running on a specific slave.
I would like some guidance to help categorizing this problem.
Regards
ben
[2018-06-26T10:12:36,417][INFO ][o.e.c.m.MetaDataMappingService] [dZQ-7Yb] [nuxeo/6eYqvs-BS0K4X01NkwrVRQ] update_mapping [doc]
[2018-06-26T10:12:59,907][WARN ][o.e.i.e.Engine ] [dZQ-7Yb] [nuxeo][0] failed engine [merge failed]
org.apache.lucene.index.MergePolicy$MergeException: java.lang.IllegalStateException: found existing value for PerFieldPostingsFormat.format, field=note:note.fulltext, old=Lucene50, new=Lucene50
at org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2113) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:724) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: java.lang.IllegalStateException: found existing value for PerFieldPostingsFormat.format, field=note:note.fulltext, old=Lucene50, new=Lucene50
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.buildFieldsGroupMapping(PerFieldPostingsFormat.java:226) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:152) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:230) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:115) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4443) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4083) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:99) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:661) ~[lucene-core-7.3.1.jar:7.3.1 ae0705edb59eaa567fe13ed3a222fdadc7153680 - caomanhdat - 2018-05-09 09:27:24]
[2018-06-26T10:12:59,913][WARN ][o.e.i.c.IndicesClusterStateService] [dZQ-7Yb] [[nuxeo][0]] marking and sending shard failed due to [shard failure, reason [merge failed]]
...
[2018-06-26T10:14:55,500][INFO ][o.e.c.r.a.AllocationService] [dZQ-7Yb] Cluster health status changed from [GREEN] to [RED] (reason: [shards failed [[nuxeo][0]] ...]).