Score based on the matched multi-match counts

Hi,

I was wondering the following thing. How to get the count of the matching fields?
Like if you have a query with boosting of some kind o following query.

GET hotels/_search
{
  "query": {
    "multi_match": {
      "query": "Spain",
      "fields": ["city.title^2", "country.title", "name^3"]
    }
  }
}

Lets say the name and country where a match. How to give back a count of 2 as a score?
So the score is the field count that matched the query.

Hey Peter,

If you do not care about the the BM25 score, and just want a score based on the number of matching fields, you can switch to the boolean similarity. This will calculate a score of 0 or 1, depending on whether a field is a match or not.

You can apply boolean similarity per field in your mappings, or you can make it the default similarity for your index.

If you then combine that with the multi_match most_fields type, you will get an overall score that represents the total number of matching fields:

GET hotels/_search
{
  "query": {
    "multi_match": {
      "query": "Spain",
      "fields": ["city.title", "country.title", "name"],
      "type": "most_fields"
    }
  }
}

Note that this doesn't really work well with the boosting if you want to get a pure matching field count.

And if you want to use boosting, and the result getting back into an additional field with the matched counted fields? Is that an idea or option?

Alright, so you want to have both the original BM25 scores using the boosts, as well as have Elasticsearch return the number of matching fields? I can think of one way of doing that, which requires your query to be written more verbosely.

First, rewrite your multi_match query to a dis_max query. The dis_max query is a more verbose way of writing a multi_match query of type best_fields. Your query can be rewritten into:

GET hotels/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "city.title": {
              "query": "Spain",
              "boost": 2
            }
          }
        },
        {
          "match": {
            "country.title": {
              "query": "Spain"
            }
          }
        },
        {
          "match": {
            "name": {
              "query": "Spain",
              "boost": 3
            }
          }
        }
      ]
    }
  }
}

Now, you can use named queries to let Elasticsearch tell you which of the sub queries actually matched each document. Using this feature, the dis_max query can be written as:

GET hotels/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "city.title": {
              "query": "Spain",
              "boost": 2,
              "_name": "1"
            }
          }
        },
        {
          "match": {
            "country.title": {
              "query": "Spain",
              "_name": "2"
            }
          }
        },
        {
          "match": {
            "name": {
              "query": "Spain",
              "boost": 3,
              "_name": "3"
            }
          }
        }
      ]
    }
  }
}

Each hit will now have an array of query names that matched this document:

"matched_queries": [
          "1",
          "2",
          "3"
        ]

Your application can easily get the size of this array to get the number of matching fields.

1 Like

Thanks Abdon, I will test this out tonight.

Hi Abdon,

Is it possible to do something like this:

GET hotels/_search
{
    "query": {
        "function_score": {
            "query": {
              "dis_max": {
                "queries": [
                  {
                    "match": {
                      "city.title": {
                        "query": "Barcelona",
                        "boost": 2,
                        "_name": "1"
                      }
                    }
                  },
                  {
                    "match": {
                      "country.title": {
                        "query": "Spanje",
                        "_name": "2"
                      }
                    }
                  },
                  {
                    "match": {
                      "name": {
                        "query": "Hotel",
                        "boost": 3,
                        "_name": "3"
                      }
                    }
                  }
                ]
              }
            },
            "script_score" : {
                "script" : {
                  "source": "_source['matched_queries'].size()"
                }
            }
        }
    }
}

So wrapping it up in a function_score, and then return the size() of the matched_queries field?

I don't think that this is possible.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.