I'm facing an issue understanding that sometimes synonym filter increments position and sometimes it doesn't. This is causing matching issues with match_phrase queries.
I've posted the entire details on this stackoverflow link : https://stackoverflow.com/questions/57602095/different-position-incremental-behaviour-of-synonym-filter
I'm using ES version 5.6.8
Any help would be greatly appreciated.
You might try replacing the "synonym" token filter with "synonym_graph" followed by "flatten_graph"? These newer filters are included (though marked as beta) in your ES version.
Part of my reason for suggesting this is because SynonymFilter is now deprecated in favor of SynonymGraphFilter in Lucene, although I'm not sure whether there's necessarily a 1:1 correspondence between "synonym" filter type in ES and SynonymFilter in Lucene.
One caveat: although "synonym_graph" is no longer marked as beta in current ES, and despite the fact that SynonymFilter is marked as deprecated in Lucene , the current ES docs (7.3) still recommend to prefer "synonym" over "synonym_graph" at index time. I'm not sure whether the change I'm suggesting would have unintended effects, but perhaps others could weigh in on that question?
I have tried that too but the similar behaviour is happening. Token positions are getting incremented in case the input stream of tokens have multiple tokens at the same position. (same behaviour is happening when using synonym filter after word delimiter filter)
I tried it in the newer versions and placing synonym filter after ngram (and other filters like word delimiter; more details in the link below) isn't being allowed since ES version 7.x and shows a warning in ES version 6.x.
I found a link for this which tells about the deprecation of synonym filter after token filters which can produce multiple tokens at same position : https://github.com/elastic/elasticsearch/pull/34331
My use case is to replace the tokens using a dictionary (those are single words).
Is there any way to achieve the token replacement behaviour after filters which can produce multi-tokens at the same position and still preserve the token positions after being replaced ?
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.