Synonym token in elasticsearch v2.4 supports tokenizer parameter. Hence token filter could have its own tokenizer (here "keyword") different from that is being used in custom analyser (here "whitespace") as in below setting:
The use case is when a term has multi-word synonym.
For e.g. the following is synonym list:
abc,xyz,lmn pqr
Now if the input string for the analyzer is xyz then the expected output after analysis should be the following terms: abc xyz lmn pqr
But since we cannot specify tokenizer (keyword) for synonym filter anymore in elasticsearch 6.x the synonym lmn pqr get tokenised into two terms lmn and pqr which was not intended.
I can understand indexing phase, so still I'm not sure how you use "lmn pqr" token.
Could you explain whole your use case? how do you search or what/how do you want to use "lmn pqr" term?
For terms aggregation it won't work but if you want to make these multi terms a single keyword you can also change your synonym rule and apply it at index and query time: abc,xyz,lmn pqr => abc,xyz,lmn_pqr
In this example the synonym terms will be abc, xyz and lmn_pqr so the terms aggregation would correctly return the count for the term lmn_pqr?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.