Synonym Token Filter Error

abdon · December 9, 2018, 11:01am

The problem is with line 40602 in your synonym file:

s(104178190,2,'78',n,2,0).

The number 78 is completely removed by your lowercase tokenizer. That tokenizer is based on the letter tokenizer which has this behavior.

There are a few options to solve this issue. Firstly, you could remove the synonyms that are pure numbers from your synonym file, like line 40602 in wn_s.pl.

Or, you could switch to an analyzer that does not drop numbers. For example the standard analyzer in combination with the lowercase token filter (probably the best solution):

PUT /analyzers-blog-04-02
{
 "analysis": {
    "filter": {
      "synonym": {
        "type": "synonym",
        "format": "wordnet",
        "synonyms_path": "analysis/wn_s.pl"
      }
    },
    "analyzer": {
      "wordnet-synonym-analyzer": {
        "tokenizer": "standard",
        "filter": [
          "lowercase",
          "synonym"
        ]
      }
    }
  }
}

Or, if you really want to use the existing analyzer, you can set "lenient": true in your synonym token filter definition:

PUT /analyzers-blog-04-02
{
 "analysis": {
    "filter": {
      "synonym": {
        "type": "synonym",
        "format": "wordnet",
        "synonyms_path": "analysis/wn_s.pl",
        "lenient": true
      }
    },
    "analyzer": {
      "wordnet-synonym-analyzer": {
        "tokenizer": "lowercase",
        "filter": [
          "synonym"
        ]
      }
    }
  }
}

Topic		Replies	Views
ES 6.6.2 - Synonym Filter Not Working as Expected Elasticsearch	1	379	August 14, 2019
Synonym token filter question Elasticsearch	3	398	April 29, 2020
Synonym Filter Elasticsearch	2	374	July 6, 2017
Synonym token filter didn't work properly with Chinese Elasticsearch	1	684	July 5, 2017
Why the synonym filter change in 6.0? Elasticsearch	8	3438	June 27, 2018

Synonym Token Filter Error

Related topics