Confusion in using synonym token filter in elasticsearch 6.x

Synonym token in elasticsearch v2.4 supports tokenizer parameter. Hence token filter could have its own tokenizer (here "keyword") different from that is being used in custom analyser (here "whitespace") as in below setting:

{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "my_synonym_filter": {
            "type": "synonym",
            "synonyms_path": "synonym.txt",
            "tokenizer": "keyword"
          }
        },
        "analyzer": {
          "my_synonyms": {
            "filter": [
              "lowercase",
              "my_synonym_filter"
            ],
            "tokenizer": "whitespace"
          }
        }
      }
    }
  }
}

How can I achieve the same elasticsearch v6.x?

Hi,

Can you explain more your use case?
I'm not sure why "whitespace" is not good...

Hi,

The use case is when a term has multi-word synonym.
For e.g. the following is synonym list:
abc,xyz,lmn pqr

Now if the input string for the analyzer is xyz then the expected output after analysis should be the following terms:
abc
xyz
lmn pqr
But since we cannot specify tokenizer (keyword) for synonym filter anymore in elasticsearch 6.x the synonym lmn pqr get tokenised into two terms lmn and pqr which was not intended.

I can understand indexing phase, so still I'm not sure how you use "lmn pqr" token.
Could you explain whole your use case? how do you search or what/how do you want to use "lmn pqr" term?

For multi-words synonym you should not use the synonym filter but the synonym_graph which handles multi-words correctly:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-graph-tokenfilter.html
This filter is designed to be used only at query time, query parsers are now able to detect multi-words synonym and they build a phrase query for them "lmn pqr" in your example:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#query-dsl-match-query-synonyms

1 Like

What about terms aggregation? If I want doc count against the term: lmn pqr then I guess it won't work

For terms aggregation it won't work but if you want to make these multi terms a single keyword you can also change your synonym rule and apply it at index and query time:
abc,xyz,lmn pqr => abc,xyz,lmn_pqr
In this example the synonym terms will be abc, xyz and lmn_pqr so the terms aggregation would correctly return the count for the term lmn_pqr?

Thanks
Let me try if this solution cater to my needs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.