You can perform hybrid retrieval by providing both the knn option and a query. This search finds the global top k = 5 vector matches, combines them with the matches from the match query, and finally returns the 10 top-scoring results. The knn and query matches are combined through a disjunction, as if you took a boolean or between them... The score of each hit is the sum of the knn and query scores.
The query rescorer executes a second query only on the Top-K results returned by the query and post_filter phases
What is the expected behavior if we include knn, query and rescore all at the same time? Will the rescore take as input the top window_size results from the query alone? Or will it take as input the top results from the query-knn hybrid?
Our desired behavior is to rescore the top 400 results from the hybrid-output. If simply specifying knn, query and rescore does not do this, is there some other way to accomplish this?
Here are the steps that occur if using rescore in conjunction with knn and a query.
First, the K nearest neighbors are found. This is a global K across all shards.
Those document scores are combined with the query and the query is executed against all the shards
Per shard, with the combined score, rescore is called on the TOP window_size documents.
This has the following consequences:
k is NOT dynamically increased to match window_size.
window_size is PER shard, and k is a global top set of values. Meaning, it could be that a given shard didn't have any neighbors that were within the global top k, and do not contribute to the document score on that shard.
Rescore will run on the results of query and knn. But it may be that the top scoring documents from query dominate the knn scores. Thus you don't really see any knn score contribution in the top documents.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.