Hi All,
I'm facing an issue understanding that sometimes synonym filter increments position and sometimes it doesn't. This is causing matching issues with match_phrase queries.
You might try replacing the "synonym" token filter with "synonym_graph" followed by "flatten_graph"? These newer filters are included (though marked as beta) in your ES version.
Part of my reason for suggesting this is because SynonymFilter is now deprecated in favor of SynonymGraphFilter in Lucene, although I'm not sure whether there's necessarily a 1:1 correspondence between "synonym" filter type in ES and SynonymFilter in Lucene.
One caveat: although "synonym_graph" is no longer marked as beta in current ES, and despite the fact that SynonymFilter is marked as deprecated in Lucene , the current ES docs (7.3) still recommend to prefer "synonym" over "synonym_graph" at index time. I'm not sure whether the change I'm suggesting would have unintended effects, but perhaps others could weigh in on that question?
I have tried that too but the similar behaviour is happening. Token positions are getting incremented in case the input stream of tokens have multiple tokens at the same position. (same behaviour is happening when using synonym filter after word delimiter filter)
I tried it in the newer versions and placing synonym filter after ngram (and other filters like word delimiter; more details in the link below) isn't being allowed since ES version 7.x and shows a warning in ES version 6.x.
My use case is to replace the tokens using a dictionary (those are single words).
Is there any way to achieve the token replacement behaviour after filters which can produce multi-tokens at the same position and still preserve the token positions after being replaced ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.