Weird scoring when using multi word synonyms

Scoring is impacted by a couple of things, and its better to look at the validate/explain without rewrite on:

GET test/_validate/query?explain=true
{
  "query": {
    "multi_match": {
      "query": "usa",
      "type": "cross_fields", 
      "fields": [
        "title",
        "description"
      ]}}}

Which gives

blended(terms:[title:usa, title:america, description:usa, description:america])

and

(blended(terms:[title:america, description:america]) (title:\"united states\" | description:\"united states\") blended(terms:[title:usa, description:usa]))

Which is what I would expect. Scoring is impacted by:

  • Blending takes each term and blends the document frequency, to compute the same idf for each term
  • Blending then takes the dismax of each term score, so the highesc scoring term
  • Blending does not support phrase queries, so in the latter case the query cannot be fully blended and instead there's a dismax of each representation of the full phrase
  • Phrase queries have their own document frequency scoring - approximating the idf by taking the sum of the constituent term's idfs

Shameless plug, but we go through this in our relevance training and I'll give you a sneak peak at this slide which I think is helpful:

image

3 Likes