Confusion in using synonym token filter in elasticsearch 6.x

nishant.saini · April 18, 2018, 4:38am

Synonym token in elasticsearch v2.4 supports tokenizer parameter. Hence token filter could have its own tokenizer (here "keyword") different from that is being used in custom analyser (here "whitespace") as in below setting:

{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "my_synonym_filter": {
            "type": "synonym",
            "synonyms_path": "synonym.txt",
            "tokenizer": "keyword"
          }
        },
        "analyzer": {
          "my_synonyms": {
            "filter": [
              "lowercase",
              "my_synonym_filter"
            ],
            "tokenizer": "whitespace"
          }
        }
      }
    }
  }
}

How can I achieve the same elasticsearch v6.x?

johtani · April 18, 2018, 8:46am

Hi,

Can you explain more your use case?
I'm not sure why "whitespace" is not good...

nishant.saini · April 18, 2018, 9:35am

Hi,

The use case is when a term has multi-word synonym.
For e.g. the following is synonym list:
abc,xyz,lmn pqr

Now if the input string for the analyzer is xyz then the expected output after analysis should be the following terms:
abc
xyz
lmn pqr
But since we cannot specify tokenizer (keyword) for synonym filter anymore in elasticsearch 6.x the synonym lmn pqr get tokenised into two terms lmn and pqr which was not intended.

johtani · April 18, 2018, 10:29am

I can understand indexing phase, so still I'm not sure how you use "lmn pqr" token.
Could you explain whole your use case? how do you search or what/how do you want to use "lmn pqr" term?

jimczi · April 18, 2018, 10:29am

For multi-words synonym you should not use the synonym filter but the synonym_graph which handles multi-words correctly:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-graph-tokenfilter.html
This filter is designed to be used only at query time, query parsers are now able to detect multi-words synonym and they build a phrase query for them "lmn pqr" in your example:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#query-dsl-match-query-synonyms

nishant.saini · April 19, 2018, 5:50am

What about terms aggregation? If I want doc count against the term: lmn pqr then I guess it won't work

jimczi · April 19, 2018, 6:46am

For terms aggregation it won't work but if you want to make these multi terms a single keyword you can also change your synonym rule and apply it at index and query time:
abc,xyz,lmn pqr => abc,xyz,lmn_pqr
In this example the synonym terms will be abc, xyz and lmn_pqr so the terms aggregation would correctly return the count for the term lmn_pqr?

nishant.saini · April 20, 2018, 3:46am

Thanks
Let me try if this solution cater to my needs.

system · May 18, 2018, 3:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multi-term synonyms: How can this be used in practice? Elasticsearch	6	2985	April 8, 2020
Problem with synonym token filter Elasticsearch	8	460	July 6, 2017
Unable to bypass restriction with synonym token filter Elasticsearch	7	1017	October 3, 2019
How to search with synonym analyzer Elasticsearch	4	2493	December 29, 2016
How to search a contraction word? Elasticsearch	1	597	September 20, 2018

Confusion in using synonym token filter in elasticsearch 6.x

Related topics