Weird scoring when using multi word synonyms

softwaredoug · November 14, 2018, 11:25pm

Scoring is impacted by a couple of things, and its better to look at the validate/explain without rewrite on:

GET test/_validate/query?explain=true
{
  "query": {
    "multi_match": {
      "query": "usa",
      "type": "cross_fields", 
      "fields": [
        "title",
        "description"
      ]}}}

Which gives

blended(terms:[title:usa, title:america, description:usa, description:america])

and

(blended(terms:[title:america, description:america]) (title:\"united states\" | description:\"united states\") blended(terms:[title:usa, description:usa]))

Which is what I would expect. Scoring is impacted by:

Blending takes each term and blends the document frequency, to compute the same idf for each term
Blending then takes the dismax of each term score, so the highesc scoring term
Blending does not support phrase queries, so in the latter case the query cannot be fully blended and instead there's a dismax of each representation of the full phrase
Phrase queries have their own document frequency scoring - approximating the idf by taking the sum of the constituent term's idfs

Shameless plug, but we go through this in our relevance training and I'll give you a sneak peak at this slide which I think is helpful:

Topic		Replies	Views
Multiple word synonims does not affect score in query Elasticsearch	5	629	March 19, 2019
Multiple synonyms contribute to the score Elasticsearch	5	923	July 6, 2017
Synonyms result scoring Elasticsearch	5	3607	December 8, 2018
Multi words synonym query issues Elasticsearch	3	562	December 19, 2019
Synonym multi words search Elasticsearch	7	597	July 6, 2017

Weird scoring when using multi word synonyms

Related topics