Sum score in multi match


(Artem) #1

Hi, guys.

I want to sum score in multi match query with cross_field type and set tie_breaker to 1.0. I test two versions of Elasticsearch: 5.4.0 and 6.2.1. Version 5.4.0 works as expected, but in 6.2.1 occurs maximum instead of summary, as I wanted.

Example code:

PUT test
{
  "settings": {
    "number_of_replicas": 0,
    "number_of_shards": 1
  },
  "mappings": {
    "type1": {
      "properties": {
        "title": {
          "type": "text"
        },
        "description": {
          "type": "text"
        }
      }
    }
  }
}
 POST test/type1
 {
   "title": "Great Gatsby",
   "description": "Great Gatsby is the book about ..."
 }
 
 GET test/type1/_search
 {
   "explain": true, 
   "query": {
     "multi_match": {
       "query": " Great Gatsby",
       "type": "cross_fields", 
       "tie_breaker": 1, 
       "fields": ["title^0.9", "description^0.4"]
     }
   }
 }

Result in 5.4.0:

 {
   "took": 3,
   "timed_out": false,
   "_shards": {
     "total": 1,
     "successful": 1,
     "failed": 0
   },
   "hits": {
     "total": 1,
     "max_score": 0.67854714,
     "hits": [
       {
         "_shard": "[test][0]",
         "_node": "6I5w97MhSqiV-kkuiKq3Ig",
         "_index": "test",
         "_type": "type1",
         "_id": "AWH2m1UNIkD_TjFR4SoN",
         "_score": 0.67854714,
         "_source": {
           "title": "Great Gatsby",
           "description": "Great Gatsby is the book about ..."
         },
         "_explanation": {
           "value": 0.67854714,
           "description": "sum of:",
           "details": [
             {
               "value": 0.33927357,
               "description": "**sum of**:",
               "details": [
                 {
                   "value": 0.10696911,
                   "description": "weight(description:great in 0) [PerFieldSimilarity], result of:",
                   "details": ...
                 },
                 {
                   "value": 0.23230445,
                   "description": "weight(title:great in 0) [PerFieldSimilarity], result of:",
                   "details": ...
                 }
               ]
             },
             {
               "value": 0.33927357,
               "description": "**sum of**:",
               "details": [
                 {
                   "value": 0.10696911,
                   "description": "weight(description:gatsby in 0) [PerFieldSimilarity], result of:",
                   "details": ...
                 },
                 {
                   "value": 0.23230445,
                   "description": "weight(title:gatsby in 0) [PerFieldSimilarity], result of:",
                   "details": ...
                   ]
                 }
               ]
             }
           ]
         }
       }
     ]
   }
 }

Result in 6.2.1:

 {
   "took": 0,
   "timed_out": false,
   "_shards": {
     "total": 1,
     "successful": 1,
     "skipped": 0,
     "failed": 0
   },
   "hits": {
     "total": 1,
     "max_score": 0.51782775,
     "hits": [
       {
         "_shard": "[test][0]",
         "_node": "fUje_eDeTp6kXL83BFQM_Q",
         "_index": "test",
         "_type": "type1",
         "_id": "LF6d9mEBXfZ7hY5VVfJU",
         "_score": 0.51782775,
         "_source": {
           "title": "Great Gatsby",
           "description": "Great Gatsby is the book about ..."
         },
         "_explanation": {
           "value": 0.51782775,
           "description": "sum of:",
           "details": [
             {
               "value": 0.25891387,
               "description": "**max of**:",
               "details": [
                 {
                   "value": 0.25891387,
                   "description": "weight(title:great in 0) [PerFieldSimilarity], result of:",
                   "details": [
                     {
                       "value": 0.25891387,
                       "description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
                       "details": ...
                     }
                   ]
                 },
                 {
                   "value": 0.11507284,
                   "description": "weight(description:great in 0) [PerFieldSimilarity], result of:",
                   "details": ...
                 }
               ]
             },
             {
               "value": 0.25891387,
               "description": "**max of**:",
               "details": [
                 {
                   "value": 0.25891387,
                   "description": "weight(title:gatsby in 0) [PerFieldSimilarity], result of:",
                   "details": ...
                 },
                 {
                   "value": 0.11507284,
                   "description": "weight(description:gatsby in 0) [PerFieldSimilarity], result of:",
                   "details": ...
                 }
               ]
             }
           ]
         }
       }
     ]
   }
 }

(Artem) #2

I figured out what was going on. I found changes in the source code in MultiMatchQuery.java. The commit 21a57c14945fb0b82d2b78a2c89e0d92bbc086a0 says:

Always use DisjunctionMaxQuery to build cross fields disjunction (#25115)

This commit modifies query_string, simple_query_string and multi_match queries to always use a DisjunctionMaxQuery when a disjunction over multiple fields is built. The tiebreaker is set to 1 in order to behave like the boolean query in terms of scoring.
The removal of the coord factor in Lucene 7 made this change mandatory to correctly handle minimum_should_match.

Closes #23966

But the documentation says:
when tie_breaker = 1.0 then "Add together the scores for (eg) first_name:will and last_name:will"

In fact there is a maximum. It is an error in the documentation?

To sum score foreach field the query should be like this:
GET test/type1/_search

{
  "explain": true,
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "type": "most_fields",
            "query": "Great",
            "fields": [
              "title^0.9",
              "description^0.4"
            ]
          }
        },
        {
          "multi_match": {
            "type": "most_fields",
            "query": "Gatsby",
            "fields": [
              "title^0.9",
              "description^0.4"
            ]
          }
        }
      ]
    }
  }
}

(Nathan Gass) #3

I dont think your solution is equivalent to an elasticsearch 5 cross_fields multi_match query with tie_breaker set to 1. You are missing the blended idf scores of cross_fields. Anyway, at least the documentation should be fixed if the new behavior is intentional.

I opened an issue at https://github.com/elastic/elasticsearch/issues/28933.


(Artem) #4

Thank you!


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.