Thanks elastic team, for adding knn_query. Now with knn_query it is possible to use function_score
or script_score
and influence the rank of vector search result and not solely rely on the cosine similarity score. This was not possible using top_level_knn_section.
However, I have a question regarding the design of knn_query
: Why is there no separate k
parameter? Managing kNN searches without a distinct k
parameter seems both inconvenient and potentially less accurate.
Drawbacks of Coupling k
with size
:
-
Vector Search Systems: For pure vector_search system if I want to retieve
k
results and usingsize
andfrom
parameter to perform pagination. I have to set mynum_candidates
andnumber_of_shards
such thatnum_candidates*number_of_shards
is close (or equal) tok
. This setup disrupts the typical operation of HNSW, wherenum_candidates
(equivalent to efSearch) is kept higher thank
to enhance the accuracy of vector matches. This (num_candidates>k) setting compensates for potential accuracy losses due to the greedy and local nature of node traversal (during search operation) in HNSW. -
Hybrid Search Systems: In most systems I've observed, there's a limit on the maximum vector matching candidates, controlled by
k
. Pagination is then handled with thefrom
andsize
parameters. But, withk
parameter coupled withsize
parameter, then as explained in above point, we need to usenum_candidates
andnumber_of_shards
as a proxy to limit the vector matching candidates. -
Aggregation Results: As documented, the final results from aggregations contain
num_candidates * number_of_shards
documents, which may not be ideal.
The potential for knn_query
to enhance searches with function_score
, script_score
, and sub-searches is significant. I'm curious about the rationale behind not supporting a separate k
parameter in knn_query
, especially since we are already collecting a "num_candidates" number of results from each shard. It seems feasible for the coordinator or manager node to simply prune the list to k
.
Could you provide some insights into this design choice?