Weired behaviour of fuzziness in elasticsearch

I have following document in my_index:

{
  "title": "weiß",
  "id": 1
}

Consider the following query:

GET my_index/_search?explain=true
{
  "_source": ["title"], 
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": {
              "query": <query>,
              "fuzziness": "AUTO"
            }
          }
        }
      ]
    }
  }
}

I can't figure out why query="weißee" matches the above document, but query="weißer" doesn't. Since both are at edit distance of 2, I would expect both to match. Even with if I set fuzziness to 3, 4,.. still weißer is not matching.

Here is mapping and analyzer setting of 'title' field in my_index:

Mapping:

"title" : {
  "type" : "text",
  "analyzer" : "my_analyzer"
},

Setting:

"my_analyzer" : {
  "filter" : [
    "lowercase",
    "asciifolding"
  ],
  "type" : "custom",
  "tokenizer" : "standard"
}

Output of analyzer on title field

Hi @alliswell

I cant reproduce the problem with your informations. Look my tests:

Mapping:

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "filter": [
            "lowercase",
            "asciifolding"
          ],
          "type": "custom",
          "tokenizer": "standard"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

Document:

POST my_index/_doc
{
  "title":"weiß"
}

Query
I get result when search by weißer.

GET my_index/_search
{
  "_source": ["title"], 
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": {
              "query": "weißer",
              "fuzziness": "AUTO"
            }
          }
        }
      ]
    }
  }
}

Thanks for the reply, @andre.coelho, we were able to resolve this. In my post I only shared one sample doc, but our actual index contains large number of documents. During fuzzy matching, by default Elasticsearch set "max_expansions" to 50, which limits number of fuzzy combination ES will try before halting search. Due to this sometime is possible that ES will not able eb able to retrieve releavnt documents.

ref:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.