"Strange" behaviour multimatch with fuzziness


#1

So I have 3 documents with 2 fields: title and brand

  1. Title: Björn Borg R400 Low SNB schoenen heren marine/groen/wit, Brand: Björn Borg
  2. Title: Baby Born Schoenen 2-Pack Assorti, Brand: Baby Born
  3. Title: Born In The Echoes, Brand: Virgin Records

Now I'm doing a multi_match query on this with the term Björn Borg Schoenen and a weight of 2.0 for the title and 5.0 for the brand

GET /my_index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "bjorn borg schoenen",
            "fields": [
              "title^5.0",
              "brand^2.0"
            ],
            "type": "best_fields",
            "operator": "and",
            "fuzziness": "auto"
          }
        }
      ]
    }
  }
}

Now the result that I would expect is the order they are in as above 1, 2 and 3. But I'm getting the results back in the order 2, 3 and 1.

The only thing I can think of why this is happening is because the word Born is 1 levenshtein distance away from Bjorn as well as from Borg so it matches double and the word Schoenen also is in the title of the 2nd document. And for the same double matching reason plus Born is 25% (double is 50%) of the title in the 3rd document versus 3 words in 10 equals 30% in the 1st document. But ofcourse this is not the behaviour that I would expect, let alone the behaviour that I want.

Can anyone shed a light on this? Or even provide a way to "fix" this?


(Mark Harwood) #2

Use the "explain" api to get the raw stats.
My guess is it is down to IDF of terms selected. multi-match has a tendency to reward the most bizarre context for a word when there's matches in multiple fields because rare=good.
Try the "cross_fields" type and remove the boosts on the field names. Cross_fields uses some subtle boosting tricks that ensure a match for a word in the most-likely field will score higher than the least-likely field and these tweaks are undone if you apply field-level boosts for all words.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.