Fuzzy not matching with two substitutions (distance=2)

In ES 7.4, I have created a simple index with default mapping.
There is a string field "authors". I have indexed a doc where "authors" is "Benjamin".

Now I use this query:

GET my_index/_search
{
  "query": {
    "fuzzy": {
      "authors": {
        "value":"<testvalue>",
        "fuzziness": 2
      }
    }
  }
}

I get a match for the following <testvalue>:

  • Benjaman
  • enjaman
  • Benjamni
  • enjamni (!)

But I got NO match for Banjaman!

How does that match with the principle of Levenshtein distance?

Banjaman has a distance of 2, just like e.g. enjamni, so it is not logical.

By the way, transposition as in -> ni is not "pure" Levenshein distance, but a derived concept ("Damerau-Levenshtein distance"), this confused me a bit at first.

The fuzzy query is a term-level query, which means it does not analyze the query terms. As a result, this query is case sensitive.

The author Benjamin, when indexed using the default mapping does get analyzed, with the standard analyzer. As a result, what gets indexed is the all-lower case term benjamin.

The term Banjaman differs 3 characters from benjamin, because of the upper/lower case B. As a result, the fuzzy query with a fuzziness of 2 returns no hits.

How to solve this? Use the match query with fuziness instead. The match query does analyze the query terms, and gives you case insensitivity:

GET my_index/_search
{
  "query": {
    "match": {
      "authors":{
        "query": "Banjaman",
        "fuzziness": 2
      }
    }
  }
}
1 Like

Awesome explanation, thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.