When creating an index we apply filters in the following order:
Stop token filter
Stemmer token filter
Synonym filter - these are user defined and then included in the index
We apply them in this order due to the following.
- applying synonym filter after the stemmer means our users just need to define one synonym which would then cover all inflections of the word e.g. a synonym of "mountain bike,bmx" would mean we can match search for "mountain biking" and match on "bmx".
- we apply the stop token filter before the stemmer as we found the stop token filter removed certain stemmed tokens where the original token would not be considered a stop word (such as "one" which is stemmed to "on" which would be removed by the stop token filter).
The issue we are seeing with applying the filters in this order is an error when including synonyms which includes a stop word e.g. the synonym "5sos,5 seconds of summer" raises the error "term: 5 seconds of summer analyzed to a token (summer) with position increment != 1 (got: 2)". I understand that moving the stop word filter to after the synonym filter can resolve the issue but this will cause problem 2) above, if we also move the stemmer filter this will mean our users will need to define synonyms for all inflections as per 1) above.
Possibly 1) is unavoidable and the answer is to move the stop word and stemmer filters after the synonym filter but I just wanted to understand if there was an alternative such as customising the synonym filter somehow to remove stop words from and/or stem the indexed synonyms.
Any guidance would be appreciated.