Same synonyms in different synonym files

Hello,
it seems synonyms in 2nd file are not taken into account if already used in 1st file.
In 1st file, I have:
aaa,bbb

In second file, I have:
aaa,synaaa
bbb,synbbb
ccc,synccc

When I do the _analyze

{
  "explain": false, 
  "analyzer": "synonym_full_indexing_analyzer",
  "text" : "ccc"
}

I get for:

  • aaa --> aaa, bbb (I am not getting synaaa)
  • bbb --> aaa, bbb (I am not getting synbbb)
  • ccc --> ccc, synccc

I wonder if it is related to that:

I didn't check the Elastic code.
Here is the analyzer:

                "synonym_full_indexing_analyzer": {
                    "type": "custom",
                    "tokenizer": "fr_full_text_nospace_tokenizer",
                    "filter": [
                        "conditional_elision",
                        "lowercase",
                        "asciifolding",
                        "full_indexing_synonyms_syn1",
                        "full_indexing_synonyms_syn2",
                        "flatten_graph"
                    ]
                },

Elastic 7.17.6

Hi @antoinelefloch

I did this example based to post. Chaining synonym filters seems to be possible. I'm not sure what your filters are other than synonyms and maybe they are the real problem.

PUT /synonym_chaining
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "first_synonyms": {
            "type": "synonym",
            "synonyms": ["aaa,bbb"]
          },
          "second_synonyms": {
            "type": "synonym",
            "synonyms": ["aaa,synaaa","bbb,synbbb","ccc,synccc"]
          }
        },
        "analyzer": {
          "synonym_analyzer": {
            "filter": [
              "first_synonyms",
              "second_synonyms"
            ],
            "tokenizer": "whitespace"
          }
        }
      }
    }
  }
}
GET /synonym_chaining/_analyze
{
  "analyzer": "synonym_analyzer",
  "text": "bbb"
}

Tokens:

{
  "tokens": [
    {
      "token": "bbb",
      "start_offset": 0,
      "end_offset": 3,
      "type": "word",
      "position": 0
    },
    {
      "token": "synbbb",
      "start_offset": 0,
      "end_offset": 3,
      "type": "SYNONYM",
      "position": 0
    },
    {
      "token": "aaa",
      "start_offset": 0,
      "end_offset": 3,
      "type": "SYNONYM",
      "position": 0
    },
    {
      "token": "synaaa",
      "start_offset": 0,
      "end_offset": 3,
      "type": "SYNONYM",
      "position": 0
    }
  ]
}

when I try to use simple settings, the second synonym file is not taken into account anymore.
This is Elastic 7.7.1

PUT /synonym_chaining_files
{
    "settings": {
        "index": {
            "analysis": {
                "filter": {
                    "first_synonyms": {
                        "type": "synonym",
                        "synonyms_path": "resources/v1_02/my_synonyms_1.txt"
                    },
                    "second_synonyms": {
                        "type": "synonym",
                        "synonyms": "resources/v1_02/my_synonyms_2.txt"
                    }
                },
                "analyzer": {
                    "synonym_analyzer": {
                        "filter": [
                            "first_synonyms",
                            "second_synonyms"
                        ],
                        "tokenizer": "whitespace"
                    }
                }
            }
        }
    }
}
1 Like

try replace synonyms to synonyms_path

It works much better, thank you :slight_smile:

I found the reason of my problem. It is related to my analyzer which has filters typed "synonym_graph" and not simply "synonym". Here are the filters to reproduce the problem:

  "filter": {
      "first_synonyms": {
          "type": "synonym_graph",
          "synonyms": ["aaa,bbb"],
          "lenient": "true"
      },
      "second_synonyms": {
          "type": "synonym_graph",
          "synonyms": ["aaa,synaaa","bbb,synbbb","ccc,synccc"],
          "lenient": "true"
      }
  }

If I remove "lenient", I get the error:
"term: aaa analyzed to a token (aaa) with position increment != 1 (got: 0)"

So I suppose synonym_graph typed filters do not support having the same synonym token in 2 different synonym filters.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.