Example showing how to best use fuzzy multiterm/phrase matching?

nahh · July 13, 2021, 10:40pm

I'm looking for a complete example showing how you can effectively perform fuzzy phrase matching, to get useful results as a user types text.

So far, the closest I've got is splitting the search query string into words and creating one clause per word in a span_near query. It doesn't work all that well, unfortunately, specifically when someone has typed only one or two characters in the second term. I've pasted the query below.

{
  "query": {
    "span_near": {
      "clauses": [
        {
          "span_multi": {
            "match": {
              "fuzzy": {
                "name": {"fuzziness": 2, "value": "word1"}
              }
            }
          }
        },
        {
          "span_multi": {
            "match": {
              "fuzzy": {
                "name": {"fuzziness": 2, "value": "partial2"}
              }
            }
          }
        }
      ],
      "slop": 3,
      "in_order": true
    }
  }
}

I have strings "Mexico", "Mexico City", "Kuwait", and "Kuwait City" in my index. When someone searches for "Kuwait C" I expect that "Kuwait City" score the highest, but in fact the query returns zero results for "Kuwait C" and "Kuwait Ci" (not even just "Kuwait"). If the user types "Kuwait" or "Kuwait Cit" I get the result I expect ("Kuwait" at the top for the former, "Kuwait City" at the top for the latter). FWIW, "name" is simply a "text" type in my mapping.

Naturally I've also tried something more simple like:

{
  "query": {
    "match": {
      "name": {
        "query": "Kuwait C",
        "fuzziness": 2
      }
    }
  }
}

but that returns "Kuwait" above "Kuwait City", followed by a bunch of zero-score non-matches. This is marginally better because a) at least it returns something and b) I can filter out the zero-score results. However, it means the user won't start seeing expected results until they type some more characters. I could perform my own post-search processing to sort the better match to the top but I feel like that's going to just lead to more trouble down the road, mixing ES and my own rules.

For what it is worth, I've also tried the "search_as_you_type" mapping type, and the results were the same. I've tried using fuzzy for the first clause and prefix for the second but I get zero results -- probably an entirely wrong approach.

I'm stumped. What's the secret? I am running 7.13.0.

nahh · July 13, 2021, 10:45pm

Forgot to mention I've also tried:

{
  "query": {
    "multi_match": {
      "query": "Kuwat C",
      "type": "bool_prefix",
      "fields": [
        "name",
        "name._2gram",
        "name._3gram"
      ],
      "fuzziness": 1
    }
  }
}

(and without the ngrams). This ends up being pretty bad -- it sorts "Kuwait" to the top, followed by "Kuwait City", followed by every document where name has a C in it. Additionally, searching for "Kuwait City" returns "Kuwait" before "Kuwait City".

spinscale · July 14, 2021, 8:53am

First, take a look at the search as you type field, that may save you some setup trouble and you can still run prefix queries.

Second, you can always wrap your queries in a bool query with a must clause, but add a should clause to score a direct match higher and thus boost it in the results.

I would try to stay away from tokenizing the terms myself as much as possible.

Even though it is already a little dated, the concepts in Elasticsearch - The definitive guide are still fully valid. This is a very good chapter for full text search

nahh · July 14, 2021, 3:20pm

I had been using search_as_you_type but I found I needed a way to do fuzzy searching and as far as I can tell that's not supported without additional work. For example, if someone searches for Kuwat City they'd get results for any document where City appears in the name, and often Kuwait City would appear in the middle instead of the top because of the typo. Adding fuzziness: 1 to the query helps a bit, but Kuwait still appears above Kuwait City when searching for Kuwat City.

When wrapping the queries, what query would you suggest I put in must and what in should?

Thanks for the documents, I'll take a look. I did indeed skip over anything older than ~6.x just in case it was out of date -- I've mistakenly used old ES information before and spun wheels for a while trying to get things to work.

system · August 11, 2021, 3:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to use fuzzy and match phrase? Elasticsearch	2	2818	May 8, 2022
Phrase search with fuzziness Elasticsearch	1	121	November 26, 2022
Fuzziness in span query losing 1 edit distance Elasticsearch	2	572	April 15, 2022
Fuzziness with multi_match phrase search Elasticsearch	2	544	October 9, 2017
Multi-Fields search using Span Queries with fuzziness in Elasticsearch Elasticsearch	15	3611	November 13, 2018

Example showing how to best use fuzzy multiterm/phrase matching?

Related Topics