Minimum should match percentage not returning desired results

I am looking at a weird behavior while doing a multimatch query with minimum_should_match percentage. Giving a sample use case, while searching for New York in multimatch query and then giving minimum_should_match as 90% it is returning docs which are having New York also the docs which are having New. Which seems wrong to me ?? Is something broken because this doesn't make sense ... ?
What am I missing here ?
I am using ES 5.4 and standard analyzer in this case, any help in this regard is appreciated.

GET dummy_index/test/_search
{
  "query": {
    "multi_match": {
      "query": "New York",
      "fields": [
        "address"
      ],
      "minimum_should_match": "90%"
    }
  }
}

Returning results like

{
"_index": "dummy_index",
"_type": "screening_list:dpl",
"_id": "1",
"_score": 1.0717683,
"_source": {
"address": "New York"
}
},
{
"_index": "dummy_index",
"_type": "screening_list:dpl",
"_id": "2",
"_score": 0.17426978,
"_source": {
"address": "New"
}
},

From the docs

“The number computed from the percentage is rounded down and used as the minimum.”

2 terms * 90% = 1.8, rounded down = 1

Ahhh .. gotcha.
That's the reason for breaking my search result. Any way of changing this default behavior or do I have to give multiple condition in should clause for this kind of scenario.

I’m not sure what your required logic is. Maybe start with a description of what you want to see for various queries and we can think about the Json later

Sure ... I am doing a multi keyword search, the text may have 2 tokens or more tokens.
So in case of exact match or all keywords found in a field i.e 100% match it has to give 100 score. In case of 90% match 90 score, similarly 50% match to give 50 score. For this particular example where in I am searching for New York, I am getting 100 score in case of exact match, but getting a 90 score when I have only New keyword, ideally it should be giving 50 score because of 1 token match.

It’s probably a mistake to think of the scoring as purely “percentage of terms matched”. Not all words are equal and given a search like “Fotherington street” it is better to match on the rare “Fotherington” than the commonplace “street”. So the rareness (IDF) of a term is typically a strong scoring factor. Number of terms matched is too.
“Minimum should match” is typically only used to trim the long tail of low scoring docs rather than change the default scoring behaviour of these docs.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.