Invalid synonym rule when using two files


(Bilal) #1

I have two synonyms files with few thousand lines, here is the sample causing the problem:

en_synonyms file :

cereal, semolina, wheat

fr_synonyms file :

ble, cereale, wheat

This is the error I got :

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "failed to build synonyms"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "failed to build synonyms",
    "caused_by": {
      "type": "parse_exception",
      "reason": "Invalid synonym rule at line 1",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "term: wheat analyzed to a token (cereal) with position increment != 1 (got: 0)"
      }
    }
  },
  "status": 400
}

The mapping I used:

PUT wheat_syn
{
  "mappings": {
    "wheat": {
      "properties": {
        "description": {
          "type": "text",
          "fields": {
            "synonyms": {
              "type": "text",
              "analyzer": "syn_text"
            },
            "keyword": {
             "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  },
  "settings": {
    "number_of_shards": 1,
    "analysis": {
      "filter": {
        "en_synonyms": {
          "type": "synonym",
          "tokenizer": "keyword",
          "synonyms_path" : "analysis/en_synonyms.txt"
        },
        "fr_synonyms": {
          "type": "synonym",
          "tokenizer": "keyword",
          "synonyms_path" : "analysis/fr_synonyms.txt"
        }
      },
      "analyzer": {
        "syn_text": {
          "tokenizer": "standard",
          "filter": ["lowercase", "en_synonyms", "fr_synonyms" ]
        }
      }
    }
  }
}

Both files containing the term wheat when I remove it from one of them, the index is created successfully.

I thought about combining the two files, so the result will be :

cereal, semolina, wheat, ble, cereale

But in my case I can't do that manually since it will take a lot of time (I'll look for a way to do it programmatically, depending on the answer to this question)


(Bilal) #2

Found a simple soltion:

Instead of using two files, I just concatenated the content of en_synonyms and fr_synonyms in one file all_synonyms:

cereal, semolina, wheat
ble, cereale, wheat

Then used it in the mapping.


(Christoph) #3

Hi @Bilal

great you found a solution. Your original problem still seems strange to me, and I'd like to verify the behaviour but didn't find the time to do so right now. Maybe we can improve the way multiple chained synonym filters are handled to avoid such problems.


(Christoph) #4

Hi @Bilal, fyi I opened https://github.com/elastic/elasticsearch/issues/36433 to not forget about this issue. Maybe somebody has a better explanation than me why this behaviour is like it is, but I'd like to keep track of it and investigate if we can improve on the situation here.


(Christoph) #5

Looks like there is something in the works for version 6.6 that will remedy this: https://github.com/elastic/elasticsearch/pull/34331