Example showing how to best use fuzzy multiterm/phrase matching?

I'm looking for a complete example showing how you can effectively perform fuzzy phrase matching, to get useful results as a user types text.

So far, the closest I've got is splitting the search query string into words and creating one clause per word in a span_near query. It doesn't work all that well, unfortunately, specifically when someone has typed only one or two characters in the second term. I've pasted the query below.

{
  "query": {
    "span_near": {
      "clauses": [
        {
          "span_multi": {
            "match": {
              "fuzzy": {
                "name": {"fuzziness": 2, "value": "word1"}
              }
            }
          }
        },
        {
          "span_multi": {
            "match": {
              "fuzzy": {
                "name": {"fuzziness": 2, "value": "partial2"}
              }
            }
          }
        }
      ],
      "slop": 3,
      "in_order": true
    }
  }
}

I have strings "Mexico", "Mexico City", "Kuwait", and "Kuwait City" in my index. When someone searches for "Kuwait C" I expect that "Kuwait City" score the highest, but in fact the query returns zero results for "Kuwait C" and "Kuwait Ci" (not even just "Kuwait"). If the user types "Kuwait" or "Kuwait Cit" I get the result I expect ("Kuwait" at the top for the former, "Kuwait City" at the top for the latter). FWIW, "name" is simply a "text" type in my mapping.

Naturally I've also tried something more simple like:

{
  "query": {
    "match": {
      "name": {
        "query": "Kuwait C",
        "fuzziness": 2
      }
    }
  }
}

but that returns "Kuwait" above "Kuwait City", followed by a bunch of zero-score non-matches. This is marginally better because a) at least it returns something and b) I can filter out the zero-score results. However, it means the user won't start seeing expected results until they type some more characters. I could perform my own post-search processing to sort the better match to the top but I feel like that's going to just lead to more trouble down the road, mixing ES and my own rules.

For what it is worth, I've also tried the "search_as_you_type" mapping type, and the results were the same. I've tried using fuzzy for the first clause and prefix for the second but I get zero results -- probably an entirely wrong approach.

I'm stumped. What's the secret? I am running 7.13.0.

Forgot to mention I've also tried:

{
  "query": {
    "multi_match": {
      "query": "Kuwat C",
      "type": "bool_prefix",
      "fields": [
        "name",
        "name._2gram",
        "name._3gram"
      ],
      "fuzziness": 1
    }
  }
}

(and without the ngrams). This ends up being pretty bad -- it sorts "Kuwait" to the top, followed by "Kuwait City", followed by every document where name has a C in it. Additionally, searching for "Kuwait City" returns "Kuwait" before "Kuwait City".

First, take a look at the search as you type field, that may save you some setup trouble and you can still run prefix queries.

Second, you can always wrap your queries in a bool query with a must clause, but add a should clause to score a direct match higher and thus boost it in the results.

I would try to stay away from tokenizing the terms myself as much as possible.

Even though it is already a little dated, the concepts in Elasticsearch - The definitive guide are still fully valid. This is a very good chapter for full text search

See Dealing with Human Language | Elasticsearch: The Definitive Guide [2.x] | Elastic and Full-Text Search | Elasticsearch: The Definitive Guide [2.x] | Elastic and Partial Matching | Elasticsearch: The Definitive Guide [2.x] | Elastic

I had been using search_as_you_type but I found I needed a way to do fuzzy searching and as far as I can tell that's not supported without additional work. For example, if someone searches for Kuwat City they'd get results for any document where City appears in the name, and often Kuwait City would appear in the middle instead of the top because of the typo. Adding fuzziness: 1 to the query helps a bit, but Kuwait still appears above Kuwait City when searching for Kuwat City.

When wrapping the queries, what query would you suggest I put in must and what in should?

Thanks for the documents, I'll take a look. I did indeed skip over anything older than ~6.x just in case it was out of date -- I've mistakenly used old ES information before and spun wheels for a while trying to get things to work.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.