Synonym order with unique filter breaks search

kuseman · May 30, 2023, 7:54am

Hi, have a weird issue with synonyms along with a unique token filter that I cannot get my head around.

MVP:

Settings:

                    {
                      "settings": {
                        "index": {
                          "analysis": {
                            "filter": {
                              "synonym": {
                                "type": "synonym_graph",
                                "synonyms_path": "synonyms/synonyms.txt",
                                "updateable": true
                              }
                            },
                            "analyzer": {
                              "synonym": {
                                "tokenizer": "standard",
                                "filter": [
                                  "synonym",
                                  "unique"
                                ]
                              }
                            }
                          }
                        }
                      }
                    }

synonyms.txt

billie jo wilsson,billiejo,billie-jo,billiejo wilson,billie-jo wilson,billiejo wilsson,billie-jo wilsson,billiejoo,billie jo

Mappings:

                {
                    "properties": {
                      "description": {
                        "type": "text",
                        "index": true
                      }
                    }
                }

Documents:

                {
                  "description": "billie-jo wilsson"
                }

Query:

                {
                  "query": {
                    "multi_match": {
                      "query": "billie-jo",
                      "fields": ["description"],
                      "type": "cross_fields",
                      "analyzer": "synonym",
                      "operator": "AND",
                      "boost": 0.4
                    }
                  }
                }

Doing this query yields a hit on the document indexed which is expected, but rolling the synonyms one step to the left (ie. moving the first synonym last in row):

billiejo,billie-jo,billiejo wilson,billie-jo wilson,billiejo wilsson,billie-jo wilsson,billiejoo,billie jo,billie jo wilsson

... then no hits are returned. Why is that? Is the order of the synonyms of importance ?

However if I remove the unique analyzer filter then the query starts to work again even with the rolled synonyms.

Is this behaviour to be expected for some reason I cannot understand or is there a synonym issue here?

This is performed in Elasticsearch v7.14

cbuescher · May 30, 2023, 10:22am

In cases like this its always a good starting point to check what the actual analysis output looks like. This can be done using the “_analyze” endpoint:

POST /test/_analyze
{
    "field" : "description",
    "text" : "billie-jo wilsson"    
}

This shows you the input document text is split into three tokens in subsequent positions

{
    “tokens”: [
        {
            “token”: “billie”,
            “start_offset”: 0,
            “end_offset”: 6,
            “type”: “<ALPHANUM>”,
            “position”: 0
        },
        {
            “token”: “jo”,
            “start_offset”: 7,
            “end_offset”: 9,
            “type”: “<ALPHANUM>”,
            “position”: 1
        },
        {
            “token”: “wilsson”,
            “start_offset”: 10,
            “end_offset”: 17,
            “type”: “<ALPHANUM>”,
            “position”: 2
        }
    ]
}

To get the “synonym” analyser output, do:

POST /test/_analyze
{
    “Analyzer” : “synonym”,
    “text” : “billie-jo”    
}

which shows you six tokens, three on position 0 and the rest in subsequent positions:

{
    "tokens": [
        {
            "token": "billie",
            "start_offset": 0,
            "end_offset": 9,
            "type": "SYNONYM",
            "position": 0
        },
        {
            "token": "billiejo",
            "start_offset": 0,
            "end_offset": 9,
            "type": "SYNONYM",
            "position": 0,
            "positionLength": 9
        },
        {
            "token": "billiejoo",
            "start_offset": 0,
            "end_offset": 9,
            "type": "SYNONYM",
            "position": 0,
            "positionLength": 9
        },
        {
            "token": "jo",
            "start_offset": 0,
            "end_offset": 9,
            "type": "SYNONYM",
            "position": 1
        },
        {
            "token": "wilsson",
            "start_offset": 0,
            "end_offset": 9,
            "type": "SYNONYM",
            "position": 2,
            "positionLength": 7
        },
        {
            "token": "wilson",
            "start_offset": 0,
            "end_offset": 9,
            "type": "SYNONYM",
            "position": 3,
            "positionLength": 6
        }
    ]
}

If you remove the “unique” filter you will see many more tokens, some of which I guess are needed to connect the token graph. The output is a bit complex to parse in one glance, if this needs further explanation I might have to dig a bit deeper when time allows me to.

cbuescher · May 30, 2023, 10:23am

And as a short note: synonyms shouldn't rely on their order, but the order tokens pass the "unique" filter might change which ones get discarded.

kuseman · May 30, 2023, 10:58am

Yes that was my conclusion also that the unique filter removes tokens that is needed to match ... but that doesn't explain why I get a hit on the first version of the synonyms-file but not when shifting it one step, both with unique filter turned on.

For me the tokens should remain the same no matter in what order the synonyms appear right?

system · June 27, 2023, 10:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES 6.4.2 - Synonym Filter Working Out Of Order Elasticsearch	4	3635	August 10, 2019
Custom Search analyzer Elasticsearch	4	567	January 12, 2017
I have got a little Problem with my synonym filter Elasticsearch	5	565	July 6, 2017
Help with Synonyms Elasticsearch	6	484	July 6, 2017
Query with synonym doesn't work as expected Elasticsearch	6	2522	July 5, 2017

Synonym order with unique filter breaks search

Related topics