When doing a
knn search there is a parameter
k which specifies the number of best matching documents to return (approximately). You can also use the
size parameter to determine the number of documents to return. Is there ever any reason why
size would not be set to the same value?
And what about the
from parameter, it can be used with knn searches, but I suppose that how it works is that if you have a
from value of 10 then elasticsearch will compute the k+10 best matching documents and then return them all except the first 10. Is that correct? So the higher the
from value the heavier the computation will be?
My questions are related to this query that I am using:
As you mentioned, kNN search adds
k matching documents to the search. So, if you set
k=20, then it finds 20 matches. Then
size takes the top scoring documents from these and returns them, it only tells how many hits should be returned in the response. In this case, considering your query, it will return the 20 matches.
It's a little confusing when you're doing only kNN search. But it fits with how the search API is designed.
from parameter is the same idea but defining the number of hits to skip, it’s good to paginate search results.
size parameters are optional.
Now, to gather results, kNN search finds a
num_candidates number of approximate nearest neighbor candidates on each shard. Elasticsearch collects
num_candidates results from each shard, then merges them to find the top
k results. So, if what you want is faster searches you can decrease
num_candidates, but at the cost of potentially less accurate results.
This is not how it works.
We only look directly at the top
num_candidates as of version 8.11 and earlier. I am unsure if this behavior will ever change.
k is larger than
size, we still search that many hits, but you will only retrieve
size. In the scenario where
from + size, the results will indeed skip the first
from elements and only return
But, we do NOT adjust
num_candidates with those provided parameters.
Thanks for the replies!
I see, when combining a regular query with a knn search the combined results can be more than
size will then determine the actual returned number of results. In that case it may make sense to pick a different value for either of those.
And my conclusion about the computational costs:
from doesn't affect the computational cost. The query/knn clauses are executed as is and sliced up by size/from at the very end.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.