Synonyms and Fuzziness conflict

Hi,

I have an analyzer which contains a synonym filter:

analysis: {
             analyzer: {
                 synonym_brazilian_analyzer: {
                     tokenizer: "standard",
                     filter: [
                         "lowercase",
                         "asciifolding",
                         "my_synonyms",
                         "brazilian_stop"
                     ]
                 }
             },
             filter: {
                 brazilian_stop: {
                     type: "stop",
                     stopwords: "_brazilian_"
                 },
                 my_synonyms: {
                     type: "synonym",
                     synonyms_path: "path/synonyms.txt"
                 }
             }
         }

My path/synonyms.txt contains:

shirt, blouse

My query is

             query: {
                        multi_match: {
                            fields: %w(name^3
                                  tags_names^2),
                            query: term,
                            fuzziness: "AUTO"
                        }
                    }

I have documents which contain the word 'shirt' in the tags_names field and others documents, the word 'shirts' in the same field.

If I search for 'shirts', because of the 'fuzziness' setting, both documents containing 'shirt' and 'shirts' are retrieved. However, if I search for 'shirt', only the documents with 'shirt' are returned.

Removing the

shirt, blouse

from my synonym setting file, both queries return the same documents.

Why does it happen when I search for a word which is described in my synonym setting file? Doesn't elasticsearch allow fuzzy searches on words declared in a synonym file?
How do I deal with that? Can't I use them (fuzziness and synonym) simultaneously when I am searching for a term defined in the synonym file?

Thanks,

Guilherme

Hello Guilherme,

The problem is not really related to fuzzyness. You can verify this by doing a test with Fuzzy Query and you will see that you can not reproduce this.

The issue here is related to Multi Match Query. It seems that when it finds a mapped synonym it completely disregards the fuzzyness. You can verify this using the Validate API with the rewrite option. For instance:

GET <index_name>/_validate/query?rewrite=true
{
  "query": {
    "multi_match" : {
      "query":  "shirts", 
      "fields": [ "names^3", "tags_names^2" ],
      "fuzziness": "AUTO"
    }
  }
}

Do this with shirts and shirt. You will see that while the former will be rewritten using fuzzyness, the latter will actually use synonym without any fuzzyness.

TBH, I don't know why this happens (could be a bug?). Maybe @Adrien_Grand has more to tell?

Cheers

Might be related to https://github.com/elastic/elasticsearch/issues/25518?

@jpountz I don't think it's the case here since I could not reproduce the behavior using a simple fuzzy query. Fuzzyness was only broken by synonym when doing a multi match

Right, because the fuzzy query discards analysis, it is not aware of the configured synonyms.

Thank you, everybody.

As it seems to be an opened issue (https://github.com/elastic/elasticsearch/issues/25518), for now, I'll remove the fuzzy query from my search and use only the synonym filter, as it's more important for me.

Thank you again,

Guilherme

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.