Hybrid Search high score on irrelevant documents

mg3090 · September 13, 2024, 9:25am

Hello, i am building a RAG and i am facing a problem with the _score returned in the Hybrid Query that uses KNN.

Or at least, i am not fully understanding how it works.

My elastic contains only economy related documents (chunks with embedding).

With questions like "how do i make a pizza?", I expect the score of the returned documents to be really low. But, it isn't, the score ranges from 5 to 8.
While economy related questions get a score of 13+.

What is the score threshold? How do I evaluate a good score from a bad one?

This is my query:

            "query": {
                "bool": {
                    "must": [
                        {
                            "range": {
                                "start_date": {
                                    "lte": now,
                                }
                            }
                        },
                        {
                            "range": {
                                "end_date": {
                                    "gte": now,
                                }
                            }
                        },
                        {
                            "match": {
                                "content": {
                                    "query": text,
                                }
                            }
                        }
                    ]
                }
            },
            "knn": {
                "field": "content_vector",
                "query_vector": embedding,
                "k": retrieval_k,
                "num_candidates": 50,
                "filter": {
                    "bool": {
                        "filter": [
                            {
                                "range": {
                                    "start_date": {
                                        "lte": now,
                                    }
                                }
                            },
                            {
                                "range": {
                                    "end_date": {
                                        "gte": now,
                                    }
                                }
                            }
                        ]
                    }
                }
            }

Thank you in advance!

Carlos_D · September 13, 2024, 9:36am

Hi @mg3090 !

Scoring in lexical search and vector search is very different. When combined, BM25 (lexical) scores tend to be much higher than knn ones, thus dominating the final score.

There are a couple of options you have:

Use RRF to automatically combine the scores from the two queries in a way that makes sense.
Do a linear combination of both scores by boosting the knn and match queries with different values, so you can weight them differently. This will require measuring the scores provided by the queries on your use case to provide a good adjustment.

Hybrid search has demonstrated to provide better results, but you will need to tweak the scoring to your use case.

Hope that helps!

mg3090 · September 13, 2024, 10:31am

Thank you for the fast reply!

I can't find the formula used to "sum" the 2 different scores.
I would like to understand if there is a minimum and a maximum score i can expect.

For example, from the cosine similarity i know the value is between 0 and 1:

Even from the BM25 i don't expect high numbers like 5-8 or even 13.
What is the logic behind the final score result?

Thank you again

Kathleen_DeRusso · September 13, 2024, 12:19pm

You can use the explain API to see the details of scoring, but scoring can vary widely between different BM25 queries. (If you're interested in digging in we have some blogs about how BM25 works).

RRF is the "easiest" solution, though does have some drawbacks. If you choose to do linear combination, the best way to go is to run experiments using different weights and determine the best weights for your use case.

mg3090 · September 18, 2024, 2:06pm

Thank you for the reply!
In these days i've tested what you said.
For convenience I decided to keep and display both the final score and the knn score (Extracting them from the explain API).

But i noticed that sometimes the BM25 score is missing from the _explanation. Is it a normal thing?

EDIT: sometimes is the "within top k documents" score that is missing.
I am using the same exact query i've written in the first post, i just added the explain feature

Example:

"_explanation":{
         "value":0.8893976,
         "description":"sum of:",
         "details":[
            {
               "value":0.8893976,
               "description":"sum of:",
               "details":[
                  {
                     "value":0.8893976,
                     "description":"within top k documents",
                     "details":[
                        
                     ]
                  }
               ]
            },
            {
               "value":0.0,
               "description":"match on required clause, product of:",
               "details":[
                  {
                     "value":0.0,
                     "description":"# clause",
                     "details":[
                        
                     ]
                  },
                  {
                     "value":1.0,
                     "description":"FieldExistsQuery [field=_primary_term]",
                     "details":[
                        
                     ]
                  }
               ]
            }
         ]
      }

*This with the default boost, i did not change it

Kathleen_DeRusso · September 18, 2024, 7:26pm

It's possible this is due to information coming back from each shard. Does it still happen when you run with dfs_query_then_fetch?

mg3090 · September 19, 2024, 7:29am

The link seems broken, it brings me to a deleted page.
I tried using _search?pretty=true&search_type=dfs_query_then_fetch if that is what you meant.

But it returns

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "cannot set [search_type] when using [knn] search, since the search type is determined automatically"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "cannot set [search_type] when using [knn] search, since the search type is determined automatically"
  },
  "status": 400
}

Kathleen_DeRusso · September 19, 2024, 12:46pm

Can you confirm which version you're using?

mg3090 · September 19, 2024, 1:04pm

I am using 8.14.3

Kathleen_DeRusso · September 23, 2024, 3:43pm

Thanks. Using 8.14 I'm having a hard time duplicating this.

Here's the script that I used to try to test - but it's not showing the same behavior you're seeing.

Is it possible to distill this into a small reproducible example?

Also, does it happen if you use knn as a query inside of the boolean clause instead of the top level knn?

DELETE test-index

PUT test-index
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 3
      },
      "my_date": {
        "type": "date"
      },
      "my_num": {
        "type": "integer"
      }
    }
  }
}

GET test-index

PUT test-index/_doc/1
{
  "my_vector": [ 1, 2, 3 ],
  "my_date": 915148800,
  "my_num": 1999, 
  "my_text": "foo"
}

PUT test-index/_doc/2
{
  "my_vector": [ 4, 5, 6 ],
  "my_date": 1704067200,
  "my_num": 2024, 
  "my_text": "foo bar"
}

POST test-index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "my_date": {
              "gte": 915148700
            }
          }
        },
        {
          "range": {
            "my_date": {
              "lte": 1704067300
            }
          }
        },
        {
          "match": {
            "my_text": "foo"
          }
        }
      ]
    }
  },
  "knn": {
    "field": "my_vector",
    "query_vector": [ 1, 2, 3 ], 
    "k": 10,
    "num_candidates": 100
  },
  "explain": true
}

Topic		Replies	Views
Implement my own Hybrid Search Elasticsearch vector-search	2	378	January 9, 2024
Aggregate Score for Hybrid Search Elasticsearch vector-search	22	2135	March 17, 2023
RRF score in Hybrid Search Elasticsearch vector-search	6	57	September 30, 2024
Hybrid Retrieval with approximate KNN Elasticsearch vector-search	2	633	May 1, 2023
Can I use the Exact Knn function when using Hybrid Search (using rrf)? Elastic Search	2	110	June 25, 2024

Hybrid Search high score on irrelevant documents

Related topics