Hello, i am building a RAG and i am facing a problem with the _score returned in the Hybrid Query that uses KNN.
Or at least, i am not fully understanding how it works.
My elastic contains only economy related documents (chunks with embedding).
With questions like "how do i make a pizza?", I expect the score of the returned documents to be really low. But, it isn't, the score ranges from 5 to 8.
While economy related questions get a score of 13+.
What is the score threshold? How do I evaluate a good score from a bad one?
Scoring in lexical search and vector search is very different. When combined, BM25 (lexical) scores tend to be much higher than knn ones, thus dominating the final score.
There are a couple of options you have:
Use RRF to automatically combine the scores from the two queries in a way that makes sense.
Do a linear combination of both scores by boosting the knn and match queries with different values, so you can weight them differently. This will require measuring the scores provided by the queries on your use case to provide a good adjustment.
Hybrid search has demonstrated to provide better results, but you will need to tweak the scoring to your use case.
You can use the explain API to see the details of scoring, but scoring can vary widely between different BM25 queries. (If you're interested in digging in we have some blogs about how BM25 works).
RRF is the "easiest" solution, though does have some drawbacks. If you choose to do linear combination, the best way to go is to run experiments using different weights and determine the best weights for your use case.
Thank you for the reply!
In these days i've tested what you said.
For convenience I decided to keep and display both the final score and the knn score (Extracting them from the explain API).
But i noticed that sometimes the BM25 score is missing from the _explanation. Is it a normal thing?
EDIT: sometimes is the "within top k documents" score that is missing.
I am using the same exact query i've written in the first post, i just added the explain feature
The link seems broken, it brings me to a deleted page.
I tried using _search?pretty=true&search_type=dfs_query_then_fetch if that is what you meant.
But it returns
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "cannot set [search_type] when using [knn] search, since the search type is determined automatically"
}
],
"type": "illegal_argument_exception",
"reason": "cannot set [search_type] when using [knn] search, since the search type is determined automatically"
},
"status": 400
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.