Hi,
I'm designing an analysis chain which has shingle and synonym graph filter, and other lowercase, stemmer,etc.
The purpose is to map tokens to custom dictionary and support overlapping tokens.
For example.
Dictionary:
red => color_1
red shift => company_1
Query:
Red shift
Desired analysis:
[color_1,company_1]
Now the issue is, when Elasticsearch parses the synonym list, it uses the same analyzer to analyze the terms in the dictionary . Link to source:
The shingle token filter produces n-grams which have position increment 0. Link to source
The SynonymMap throws exceptions on for tokens which have increment value != 1. Link to source
I tried to workaround this issue by Lucene code. and I did this by using a different analyzer to parse the synonyms.