Position getting incremented in Synonym filter when used after edge_ngram filter

pappu1 · August 22, 2019, 12:25pm

Hi All,
I'm facing an issue understanding that sometimes synonym filter increments position and sometimes it doesn't. This is causing matching issues with match_phrase queries.

I've posted the entire details on this stackoverflow link : https://stackoverflow.com/questions/57602095/different-position-incremental-behaviour-of-synonym-filter

I'm using ES version 5.6.8

Any help would be greatly appreciated.

mgibney · August 23, 2019, 4:09pm

You might try replacing the "synonym" token filter with "synonym_graph" followed by "flatten_graph"? These newer filters are included (though marked as beta) in your ES version.

Part of my reason for suggesting this is because SynonymFilter is now deprecated in favor of SynonymGraphFilter in Lucene, although I'm not sure whether there's necessarily a 1:1 correspondence between "synonym" filter type in ES and SynonymFilter in Lucene.

One caveat: although "synonym_graph" is no longer marked as beta in current ES, and despite the fact that SynonymFilter is marked as deprecated in Lucene , the current ES docs (7.3) still recommend to prefer "synonym" over "synonym_graph" at index time. I'm not sure whether the change I'm suggesting would have unintended effects, but perhaps others could weigh in on that question?

pappu1 · August 23, 2019, 5:29pm

I have tried that too but the similar behaviour is happening. Token positions are getting incremented in case the input stream of tokens have multiple tokens at the same position. (same behaviour is happening when using synonym filter after word delimiter filter)

I tried it in the newer versions and placing synonym filter after ngram (and other filters like word delimiter; more details in the link below) isn't being allowed since ES version 7.x and shows a warning in ES version 6.x.

I found a link for this which tells about the deprecation of synonym filter after token filters which can produce multiple tokens at same position : https://github.com/elastic/elasticsearch/pull/34331

My use case is to replace the tokens using a dictionary (those are single words).
Is there any way to achieve the token replacement behaviour after filters which can produce multi-tokens at the same position and still preserve the token positions after being replaced ?

system · September 20, 2019, 5:29pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Highlighting with edge ngram token + synonym filter Elasticsearch	1	1186	July 30, 2020
Multiple tokens with same position Elasticsearch	3	1962	July 5, 2017
Enable Position Increments property not available Elasticsearch	3	1466	July 6, 2017
Match every token position in the field when using synonyms Elasticsearch	2	1014	July 6, 2017
Help with synonyms and edge ngram analyzers Elasticsearch	2	1912	July 6, 2017

Position getting incremented in Synonym filter when used after edge_ngram filter

Related topics