According to the kNN documentation:
You can perform hybrid retrieval by providing both the knn option and a query. This search finds the global top k = 5 vector matches, combines them with the matches from the match query, and finally returns the 10 top-scoring results. The knn and query matches are combined through a disjunction, as if you took a boolean or between them... The score of each hit is the sum of the knn and query scores.
And according to the rescore documentation:
The query rescorer executes a second query only on the Top-K results returned by the query and post_filter phases
What is the expected behavior if we include
rescore all at the same time? Will the rescore take as input the top
window_size results from the
query alone? Or will it take as input the top results from the query-knn hybrid?
Our desired behavior is to rescore the top 400 results from the hybrid-output. If simply specifying
rescore does not do this, is there some other way to accomplish this?
Hey @rajivhs ,
Here are the steps that occur if using
rescore in conjunction with
knn and a
- First, the
K nearest neighbors are found. This is a global
K across all shards.
- Those document scores are combined with the
query and the query is executed against all the shards
- Per shard, with the combined score,
rescore is called on the TOP window_size documents.
This has the following consequences:
k is NOT dynamically increased to match
window_size is PER shard, and
k is a global top set of values. Meaning, it could be that a given shard didn't have any neighbors that were within the global top
k, and do not contribute to the document score on that shard.
- Rescore will run on the results of
knn. But it may be that the top scoring documents from
query dominate the
knn scores. Thus you don't really see any
knn score contribution in the top documents.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.