I'm designing an analysis chain which has shingle and synonym graph filter, and other lowercase, stemmer,etc.
The purpose is to map tokens to custom dictionary and support overlapping tokens.
red => color_1
red shift => company_1
Now the issue is, when ElasticSearch parses the synonym list, it uses the same analyzer to analyze the terms in the dictionary . Link to source:
The shingle token filter produces n-grams which have position increment 0. Link to source
The SynonymMap throws exceptions on for tokens which have increment value != 1. Link to source
I tried to workaround this issue by Lucene code. and I did this by using a different analyzer to parse the synonyms.