Multiplexer with synonyms doesn't work as expected

NikKozh · June 6, 2024, 12:25pm

Hello, I'm using almost latest Elastic 8.13 and currently trying to make analyzer with multiplexer, that uses synonym filter. However, I found out that results from simple filter-chaining (without multiplexer) differ from multiplexer with the same token filters. I made an artificial example, here is two analyzers:

PUT /test-index
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "test_analyzer": {
            "tokenizer": "classic",
            "filter": [
                "test_stemmer",
                "test_synonym_filter"
              ]
          },
          "test_analyzer_multiplexer": {
            "tokenizer": "classic",
            "filter": [
              "multiplexer_custom"
              ]
          }
        },
        "filter": {
          "multiplexer_custom": {
            "type": "multiplexer",
            "filters": [
              "test_stemmer, test_synonym_filter"
              ],
            "preserve_original": true
          },
          "test_synonym_filter": {
            "type": "synonym_graph",
            "synonyms": [
              "walking, jumping fox"
              ]
          },
          "test_stemmer": {
            "type": "stemmer",
            "language": "english"
          }
        }
      }
    }
  }
}

They are basically identical and should (I assume) output same results. But when I test it, I get different tokens. For simple filter-chaining everything is ok:

GET test-index/_analyze
{
  "analyzer": "test_analyzer",
  "text": "jumping fox"
}

Result:

{
  "tokens": [
    {
      "token": "walk",
      "start_offset": 0,
      "end_offset": 11,
      "type": "SYNONYM",
      "position": 0,
      "positionLength": 2
    },
    {
      "token": "jump",
      "start_offset": 0,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "fox",
      "start_offset": 8,
      "end_offset": 11,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

I am getting my stemmed synonym "walk", as expected. But if I test analyzer with multiplexer:

GET test-index/_analyze
{
  "analyzer": "test_analyzer_multiplexer",
  "text": "jumping fox"
}

Result is:

{
  "tokens": [
    {
      "token": "jumping",
      "start_offset": 0,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "jump",
      "start_offset": 0,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "fox",
      "start_offset": 8,
      "end_offset": 11,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

No synonym at all. I believe it's happening because by default multiplexer preserves original tokens, but where is synonym nevertheless? If I add preserve_original: false, I am getting right result, but what if I need to keep original tokens AND get synonyms while using multiplexer?

Either it's a kind of bug or I don't fully understand how it should work.

P.S. I saw almost identical topic Synonym filter not working within a Multiplexer Filter, but I believe my case is different - my synonyms doesn't intersect with each other, so RemoveDuplicatesTokenFilter from Lucene should (I think) work correctly. Maybe something going wrong in other place?

Topic		Replies	Views
Synonym filter not working within a Multiplexer Filter Elasticsearch	2	664	June 1, 2021
Word Delimiter Graph Token + Synonym Graph Token Elasticsearch	1	1016	August 13, 2021
Trouble with multiple synonym filters in a single analyzer Elasticsearch	2	1031	February 2, 2018
Two custom analyzers with the same synonym filter - why no match Elasticsearch	1	118	September 18, 2023
Multi-term synonyms: How can this be used in practice? Elasticsearch	6	3055	April 8, 2020

Multiplexer with synonyms doesn't work as expected

Related topics