AsciiFolding filter with preserve_original=true causes indexing error

Using ElasticSearch 5.6.2.

When I set preserve_original=true in an acsiifolding filter and use a word_delimiter filter together in my analyzer, I sometimes get the following error during indexing:

[2017-10-03T15:31:57,542][DEBUG][o.e.a.b.TransportShardBulkAction] [6akE529] [article-114][0] failed to execute bulk item (index) BulkShardRequest [[article-114][0]] containing [31] requests
java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=191,endOffset=192,lastStartOffset=193 for field 'textNormal'
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:778) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:447) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:403) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:273) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:433) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1384) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1360) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:660) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:606) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:504) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:557) ~[elasticsearch-5.6.2.jar:5.6.2]
at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:546) ~[elasticsearch-5.6.2.jar:5.6.2]
...

The relevant mappings are as follows:

    "asciifolding_filter" : {
      "type" : "asciifolding",
      "preserve_original" : true
    }

...
"word_delimiter_filter": {
"type": "word_delimiter",
"preserve_original": false,
"split_on_numerics": false
}
...
"text_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding_filter",
"word_delimiter_filter"
]
}

The error goes away if I either (a) set preserve_originals=false or (b) remove the word_delimiter filter.

Thanks!

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.