Synonyms and Fuzziness conflict

Hi,

I have an analyzer which contains a synonym filter:

analysis: {
             analyzer: {
                 synonym_brazilian_analyzer: {
                     tokenizer: "standard",
                     filter: [
                         "lowercase",
                         "asciifolding",
                         "my_synonyms",
                         "brazilian_stop"
                     ]
                 }
             },
             filter: {
                 brazilian_stop: {
                     type: "stop",
                     stopwords: "_brazilian_"
                 },
                 my_synonyms: {
                     type: "synonym",
                     synonyms_path: "path/synonyms.txt"
                 }
             }
         }

My path/synonyms.txt contains:

shirt, blouse

My query is

             query: {
                        multi_match: {
                            fields: %w(name^3
                                  tags_names^2),
                            query: term,
                            fuzziness: "AUTO"
                        }
                    }

I have documents which contain the word 'shirt' in the tags_names field and others documents, the word 'shirts' in the same field.

If I search for 'shirts', because of the 'fuzziness' setting, both documents containing 'shirt' and 'shirts' are retrieved. However, if I search for 'shirt', only the documents with 'shirt' are returned.

Removing the

shirt, blouse

from my synonym setting file, both queries return the same documents.

Why does it happen when I search for a word which is described in my synonym setting file? Doesn't elasticsearch allow fuzzy searches on words declared in a synonym file?
How do I deal with that? Can't I use them (fuzziness and synonym) simultaneously when I am searching for a term defined in the synonym file?

Thanks,

Guilherme

Hello Guilherme,

The problem is not really related to fuzzyness. You can verify this by doing a test with Fuzzy Query and you will see that you can not reproduce this.

The issue here is related to Multi Match Query. It seems that when it finds a mapped synonym it completely disregards the fuzzyness. You can verify this using the Validate API with the rewrite option. For instance:

GET <index_name>/_validate/query?rewrite=true
{
  "query": {
    "multi_match" : {
      "query":  "shirts", 
      "fields": [ "names^3", "tags_names^2" ],
      "fuzziness": "AUTO"
    }
  }
}

Do this with shirts and shirt. You will see that while the former will be rewritten using fuzzyness, the latter will actually use synonym without any fuzzyness.

TBH, I don't know why this happens (could be a bug?). Maybe @Adrien_Grand has more to tell?

Cheers

Might be related to https://github.com/elastic/elasticsearch/issues/25518?

@jpountz I don't think it's the case here since I could not reproduce the behavior using a simple fuzzy query. Fuzzyness was only broken by synonym when doing a multi match

Right, because the fuzzy query discards analysis, it is not aware of the configured synonyms.

Thank you, everybody.

As it seems to be an opened issue (https://github.com/elastic/elasticsearch/issues/25518), for now, I'll remove the fuzzy query from my search and use only the synonym filter, as it's more important for me.

Thank you again,

Guilherme