Are there any plans to add lucene FixBrokenOffsetsFilter to Elasticsearch Token Filters?

Starting from 7.0 Lucene check if offsets in token stream are broken and throws exception if they are. It was added https://issues.apache.org/jira/browse/LUCENE-7626
It causes a lot of issues with combination of word delimiter filters and multi word synonyms. Even with latest version of Elasticsearch I sometimes get "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards" error.

For backward compatibility with already existing filters FixBrokenOffsetsFilter was also added in the same https://issues.apache.org/jira/browse/LUCENE-7626 , but I can't use it since there is no corresponding token filters in Elasticsearch. Are there any plans to add it?

I wasn't aware of FixBrokenOffsetsFilter until now, and I quickly searched our Github repo for any mention of it and found none, so I think there are no plans to add it at the moment. If you think if should be added I would ask you to raise a Github issue so we can publicly discuss it. Since the filter seems to address fall-back solutions for backward compatibility problems that should preferably be fixed by changing the analysis token filters, I'm not sure if we should include it. To help guide this discussion it would be helpful to include your use case and why you cannot address the broken token streams in any other way.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.