Synonym token in elasticsearch v2.4 supports tokenizer parameter. Hence token filter could have its own tokenizer (here "keyword") different from that is being used in custom analyser (here "whitespace") as in below setting:
How can I achieve the same elasticsearch v6.x?
Can you explain more your use case?
I'm not sure why "whitespace" is not good...
The use case is when a term has multi-word synonym.
For e.g. the following is synonym list:
Now if the input string for the analyzer is xyz then the expected output after analysis should be the following terms:
But since we cannot specify tokenizer (keyword) for synonym filter anymore in elasticsearch 6.x the synonym lmn pqr get tokenised into two terms lmn and pqr which was not intended.
I can understand indexing phase, so still I'm not sure how you use "lmn pqr" token.
Could you explain whole your use case? how do you search or what/how do you want to use "lmn pqr" term?
For multi-words synonym you should not use the
synonym filter but the
synonym_graph which handles multi-words correctly:
This filter is designed to be used only at query time, query parsers are now able to detect multi-words synonym and they build a phrase query for them "lmn pqr" in your example:
What about terms aggregation? If I want doc count against the term: lmn pqr then I guess it won't work
terms aggregation it won't work but if you want to make these multi terms a single keyword you can also change your synonym rule and apply it at index and query time:
abc,xyz,lmn pqr => abc,xyz,lmn_pqr
In this example the synonym terms will be
lmn_pqr so the
terms aggregation would correctly return the count for the term
Let me try if this solution cater to my needs.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.