ES 6.4.2 - Synonym Filter Working Out Of Order

Version 6.4.2.

Issue - When you have 2 synonym filters defined, synonyms defined in second filter are working before the first set of synonyms.

Expected - Synonyms in the second file should work after the first set of synonyms.

Steps to Reproduce

Step 1: Create Index

PUT /es_synonym_bug

{
  "mappings": {
    "doc": {
      "_all": {
        "enabled": false
      },
      "properties": {
        "name": {
          "type": "text",
          "analyzer": "my_analyzer"
        }
      }
    }
  },
  "settings": {
    "index": {
      "number_of_shards": "1",
      "analysis": {
        "filter": {
          "first_synonym": {
            "type": "synonym",
            "synonyms": [ "b => c", "f => g" ]
          },
          "second_synonym": {
            "type": "synonym",
            "synonyms": [ "c => d", "e => f" ]
          }
        },
        "analyzer": {
          "my_analyzer": {
            "filter": [
              "first_synonym",
              "second_synonym"
            ],
            "tokenizer": "whitespace"
          }
        }
      },
      "number_of_replicas": "0"
    }
  }
}

Step 2: Run analyze query, which is working in order

GET /es_synonym_bug/_analyze

{
  "analyzer" : "my_analyzer",
  "text" : "b"
}

Output

{
  "tokens": [
    {
      "token": "d",
      "start_offset": 0,
      "end_offset": 1,
      "type": "SYNONYM",
      "position": 0
    }
  ]
}

This is fine ^.

Step 3: Run analyze query which is working out of order. Here's the problem query -

GET /es_synonym_bug/_analyze

{
  "analyzer" : "my_analyzer",
  "text" : "e"
}

Output

{
  "tokens": [
    {
      "token": "g",
      "start_offset": 0,
      "end_offset": 1,
      "type": "SYNONYM",
      "position": 0
    }
  ]
}

Expected -

{
  "tokens": [
    {
      "token": "f",
      "start_offset": 0,
      "end_offset": 1,
      "type": "SYNONYM",
      "position": 0
    }
  ]
}

Hi @abhinav2,

welcome to the forum. Thanks to your great reproduction script I was able to try this out on 7.2.0. I get your expected output of

{
  "tokens": [
    {
      "token": "f",
      "start_offset": 0,
      "end_offset": 1,
      "type": "SYNONYM",
      "position": 0
    }
  ]
}

there, so I suspect there might have been a bugfix in the meantime. Will try to find out if this is a known issue and when this was fixed. In the meantime you might want to try it with 7.2.0 yourself and see of the order of filters works for you there.

fyi, I haven't found the exact change yet but the behaviour in this particular case was still as you described in 6.5 but seems to be working as expected in 6.6, so there's probably a fix somewhere around this time. Maybe you can try 6.6 and see if it solves your problem.

Okay, thanks for taking the time to reply. I'll try 6.6.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.