Weird scoring when using multi word synonyms

Hi all!

I just ran into an issue with the scoring of multiword synonyms.
When a synonym resolves to a multiword synonym, the underlying scoring model seems to change. Resulting in unfair scoring for documents that contain the multi-word synonym.
I created a small setup to reproduce my issue (bear with me, it's a lot of code):

DELETE test
PUT test
{
  "settings": {
    "number_of_replicas": 0,
    "number_of_shards": 1,
    "analysis": {
      "filter": {
        "synonyms": {
          "type": "synonym_graph",
          "synonyms": [
            "usa, america",
            "usa, united states"
          ]
        }
      },
      "analyzer": {
        "synonym_analyzer": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "synonyms"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "synonym_analyzer"
        },
        "description": {
          "type": "text",
          "analyzer": "synonym_analyzer"
        }
      }
    }
  }
}

When analyzing two queries, the Lucene queries are as follows:

Result 1:

GET test/_validate/query?rewrite=true
{
  "query": {
    "multi_match": {
      "query": "usa",
      "type": "cross_fields", 
      "fields": [
        "title",
        "description"
      ]
    }
  }
}

Result 2:

GET test/_validate/query?rewrite=true
{
  "query": {
    "multi_match": {
      "query": "america",
      "type": "cross_fields", 
      "fields": [
        "title",
        "description"
      ]
    }
  }
}

Result 1:

(title:america | description:america) (title:"united states" | description:"united states") (title:usa | description:usa)

Result 2:

(title:usa | title:america | description:usa | description:america)

When executing the above mentioned ES queries and analyzing the result with explain, we can see the following effect:

  • In the scores of the first query, the individual query parts get SUM'ed.
  • In the second query the MAX of the results is used for scoring.

This generates a totally different search result in a big data set.

Results showing this:

POST test/test/1
{
  "title": "The USA is a very big country",
  "description": "Book about the states"
}

POST test/test/2
{
  "title": "The USA is a very big country",
  "description": "Movie about the united states"
}
POST test/_refresh

Query:

GET test/_search
{
  "explain": true, 
  "query": {
    "multi_match": {
      "query": "usa",
      "type": "cross_fields", 
      "fields": [
        "title",
        "description"
      ]
    }
  }
}

Explain:

 "_source": {
          "title": "The USA is a very big country",
          "description": "Movie about the united states"
        },
 "_explanation": {
  "value": 1.272605,
  "description": "sum of:",
  "details": [
    {
      "value": 0.19856805,
      "description": "max of:",
      "details": [
        {
          "value": 0.19856805,
          "description": "weight(title:america in 1) [PerFieldSimilarity], result of:",
          "details": [
            {
           ... rest removed

Query:

GET test/_search
{
  "explain": true, 
  "query": {
    "multi_match": {
      "query": "america",
      "type": "cross_fields", 
      "fields": [
        "title",
        "description"
      ]
    }
  }
}

Explain:

"_source": {
          "title": "The USA is a very big country",
          "description": "Book about the states"
        },
"_explanation": {
  "value": 0.19856805,
  "description": "max of:",
  "details": [
    {
      "value": 0.19856805,
      "description": "weight(title:usa in 0) [PerFieldSimilarity], result of:",
      "details": [
        {
           ... rest removed

Is this expected behavior? Or is Lucene/Elasticsearch weird in handling multiword synonym expansions?
Running Elasticsearch 6.4.2

1 Like

Bump :blush:

When querying with usa your query gets translated into usa, united, states and america (matches both fields). And when querying with america, you only get usa and america. (matches only title field). This is what impacts the overall scores. What you see is whatever fields gets applied to the score. If there is no score for a particular match, you will not see it under explain. Doesnt really mean the scoring structure or scheme is different.

Scoring is impacted by a couple of things, and its better to look at the validate/explain without rewrite on:

GET test/_validate/query?explain=true
{
  "query": {
    "multi_match": {
      "query": "usa",
      "type": "cross_fields", 
      "fields": [
        "title",
        "description"
      ]}}}

Which gives

blended(terms:[title:usa, title:america, description:usa, description:america])

and

(blended(terms:[title:america, description:america]) (title:\"united states\" | description:\"united states\") blended(terms:[title:usa, description:usa]))

Which is what I would expect. Scoring is impacted by:

  • Blending takes each term and blends the document frequency, to compute the same idf for each term
  • Blending then takes the dismax of each term score, so the highesc scoring term
  • Blending does not support phrase queries, so in the latter case the query cannot be fully blended and instead there's a dismax of each representation of the full phrase
  • Phrase queries have their own document frequency scoring - approximating the idf by taking the sum of the constituent term's idfs

Shameless plug, but we go through this in our relevance training and I'll give you a sneak peak at this slide which I think is helpful:

image

3 Likes

I would like to add one more thing: you are using the synonym_graph token filter at both index and query time. Generally it is not necessary to expand synonyms at both index and query time. If you are already indexing all synonyms, there is no need to also search for all those synonyms.

In fact, the synonyms_graph token filter is designed to be used at query time only (see this blog for more details). Using it at index time may lead to unexpected results.

I'd recommend that you use the synonym_analyzer as the search_analyzer in your mapping for the title and description fields, instead of as the analyzer.

1 Like

Think that was a mis-copy when I made the snippet, results are actually generated with normal synonym filter.

Thanks for your feedback Doug,

Think I understand a bit more what's going on and what to do to combat this.
Would be nice to have more control in the way synonyms are scored.

Easiest way at the moment is by rewriting all multiword synonyms into a single term to make sure scoring works the same for all terms

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.