Hybrid Search high score on irrelevant documents

Hello, i am building a RAG and i am facing a problem with the _score returned in the Hybrid Query that uses KNN.

Or at least, i am not fully understanding how it works.

My elastic contains only economy related documents (chunks with embedding).

With questions like "how do i make a pizza?", I expect the score of the returned documents to be really low. But, it isn't, the score ranges from 5 to 8.
While economy related questions get a score of 13+.

What is the score threshold? How do I evaluate a good score from a bad one?

This is my query:

            "query": {
                "bool": {
                    "must": [
                        {
                            "range": {
                                "start_date": {
                                    "lte": now,
                                }
                            }
                        },
                        {
                            "range": {
                                "end_date": {
                                    "gte": now,
                                }
                            }
                        },
                        {
                            "match": {
                                "content": {
                                    "query": text,
                                }
                            }
                        }
                    ]
                }
            },
            "knn": {
                "field": "content_vector",
                "query_vector": embedding,
                "k": retrieval_k,
                "num_candidates": 50,
                "filter": {
                    "bool": {
                        "filter": [
                            {
                                "range": {
                                    "start_date": {
                                        "lte": now,
                                    }
                                }
                            },
                            {
                                "range": {
                                    "end_date": {
                                        "gte": now,
                                    }
                                }
                            }
                        ]
                    }
                }
            }

Thank you in advance!

Hi @mg3090 !

Scoring in lexical search and vector search is very different. When combined, BM25 (lexical) scores tend to be much higher than knn ones, thus dominating the final score.

There are a couple of options you have:

  • Use RRF to automatically combine the scores from the two queries in a way that makes sense.
  • Do a linear combination of both scores by boosting the knn and match queries with different values, so you can weight them differently. This will require measuring the scores provided by the queries on your use case to provide a good adjustment.

Hybrid search has demonstrated to provide better results, but you will need to tweak the scoring to your use case.

Hope that helps!

Thank you for the fast reply!

I can't find the formula used to "sum" the 2 different scores.
I would like to understand if there is a minimum and a maximum score i can expect.

For example, from the cosine similarity i know the value is between 0 and 1:

Even from the BM25 i don't expect high numbers like 5-8 or even 13.
What is the logic behind the final score result?

Thank you again :slight_smile:

You can use the explain API to see the details of scoring, but scoring can vary widely between different BM25 queries. (If you're interested in digging in we have some blogs about how BM25 works).

RRF is the "easiest" solution, though does have some drawbacks. If you choose to do linear combination, the best way to go is to run experiments using different weights and determine the best weights for your use case.

Thank you for the reply!
In these days i've tested what you said.
For convenience I decided to keep and display both the final score and the knn score (Extracting them from the explain API).

But i noticed that sometimes the BM25 score is missing from the _explanation. Is it a normal thing?

EDIT: sometimes is the "within top k documents" score that is missing.
I am using the same exact query i've written in the first post, i just added the explain feature

Example:

"_explanation":{
         "value":0.8893976,
         "description":"sum of:",
         "details":[
            {
               "value":0.8893976,
               "description":"sum of:",
               "details":[
                  {
                     "value":0.8893976,
                     "description":"within top k documents",
                     "details":[
                        
                     ]
                  }
               ]
            },
            {
               "value":0.0,
               "description":"match on required clause, product of:",
               "details":[
                  {
                     "value":0.0,
                     "description":"# clause",
                     "details":[
                        
                     ]
                  },
                  {
                     "value":1.0,
                     "description":"FieldExistsQuery [field=_primary_term]",
                     "details":[
                        
                     ]
                  }
               ]
            }
         ]
      }

*This with the default boost, i did not change it

It's possible this is due to information coming back from each shard. Does it still happen when you run with dfs_query_then_fetch?

The link seems broken, it brings me to a deleted page.
I tried using _search?pretty=true&search_type=dfs_query_then_fetch if that is what you meant.

But it returns

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "cannot set [search_type] when using [knn] search, since the search type is determined automatically"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "cannot set [search_type] when using [knn] search, since the search type is determined automatically"
  },
  "status": 400
}

Can you confirm which version you're using?

I am using 8.14.3

Thanks. Using 8.14 I'm having a hard time duplicating this.

Here's the script that I used to try to test - but it's not showing the same behavior you're seeing.

Is it possible to distill this into a small reproducible example?

Also, does it happen if you use knn as a query inside of the boolean clause instead of the top level knn?

DELETE test-index

PUT test-index
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 3
      },
      "my_date": {
        "type": "date"
      },
      "my_num": {
        "type": "integer"
      }
    }
  }
}

GET test-index

PUT test-index/_doc/1
{
  "my_vector": [ 1, 2, 3 ],
  "my_date": 915148800,
  "my_num": 1999, 
  "my_text": "foo"
}

PUT test-index/_doc/2
{
  "my_vector": [ 4, 5, 6 ],
  "my_date": 1704067200,
  "my_num": 2024, 
  "my_text": "foo bar"
}

POST test-index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "my_date": {
              "gte": 915148700
            }
          }
        },
        {
          "range": {
            "my_date": {
              "lte": 1704067300
            }
          }
        },
        {
          "match": {
            "my_text": "foo"
          }
        }
      ]
    }
  },
  "knn": {
    "field": "my_vector",
    "query_vector": [ 1, 2, 3 ], 
    "k": 10,
    "num_candidates": 100
  },
  "explain": true
}