The num_candidates parameter leads to some confusing query results

EricTowns · May 13, 2024, 8:57am

Hi team!
I have an index, only 1 primary shard, I insert documents and then knn searching. When I do a knn query with k=10 and num_candidates=20, I get a batch of results. When k=10 and num_candidates=25, I got some documents with higher scores, which confused me, num_candidates is supposed to be the top n closest documents per shard, Why don't the documents with higher scores for num_candidates=25 appear in the results for num_candidates=20

Elasticsearch version : 8.7.0

result:

BenTrent · May 13, 2024, 11:22am

Hey @EricTowns , num_candidates is the number of vectors searched per shard. It is the same as the efSearch parameter in the original HNSW paper: [1603.09320] Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs

Meaning, it helps control the approximate nature of the search. Higher num_candidate more time is spent exploring the graph, but a more accurate set of neighbors is returned. Lower num_candidates is faster search, but less accurate.

EricTowns · May 13, 2024, 1:12pm

Thank you for your answer. I have another question. Sometimes, the first time using knn can not search, but I am sure the doc has been stored successfully. When I rewrite it again, the doc can be searched, and the search vector is the same. Give advice freely!

BenTrent · May 13, 2024, 1:54pm

Sometimes, the first time using knn can not search, but I am sure the doc has been stored successfully

Do you mean the vector isn't in the result set? Or that the search throws an exception and returns an error?

Elasticsearch has a refresh & API: Refresh API | Elasticsearch Guide [8.13] | Elastic

If you want things available immediately, there are various ways to ensure things are searchable immediate. You can wait for a refresh on search, or index, or force one at index time or manually call this API. All have various costs and considerations.

But I think your issue is simply that a refresh hasn't occurred between the index & the search.

EricTowns · May 14, 2024, 2:10am

The phenomenon is:
For example, if i vectorize the text "i am a doctor" and store it, then when searching,

I vectorize the question "doctor" and perform a knn search. At this time, no search was found,
I vectorize the original text "i am a doctor" and perform a knn search, it works.
then I rewrite this vector, and then I do the first search, which is "doctor", and then I get the search,
and these searches k, num_candidates, these are the same, which is confusing to me, Is this also due to the concept of "nearest neighbor"? At first, I thought that the _score of the search results could be used as the degree of "neighbor", but now it seems that it is not?

Looking forward to your answer, It bothered me for a long time

BenTrent · May 14, 2024, 11:46am

I vectorize the question "doctor" and perform a knn search. At this time, no search was found,

Does this mean no data at all? That doesn't make sense as you should always get k neighbors back.

Or are you talking about a particular neighbor that you are interested in?

Are you forcing a refresh before searching in step 1.?

Is this also due to the concept of "nearest neighbor"?

No. _score is determined by vector similarity.

EricTowns · May 14, 2024, 4:49pm

Oh, that's a mistake in my description, what I meant by not being able to find "I'm a doctor" is that there were a bunch of vectors, and I used a vector to search for it, set the value of k, and the first time I searched for it I didn't find him in the top K results, and when I rewrote the vector I was able to find it in the top K results, and it had a higher score, and was ranked higher.

Alex_Salgado-Elastic · June 9, 2024, 12:17pm

Hello @EricTowns, is the error still happening? If so, could you send your query here and share a screenshot of the two behaviors?

Topic		Replies	Views
Elastic KNN search questions Elasticsearch vector-search	3	1053	November 6, 2023
How to handle "ef" and "num_candidates" parameters in hnsw search Elasticsearch docker , vector-search	2	3115	December 8, 2022
Semantic search on more than 10k documents Elasticsearch vector-search	4	674	October 18, 2023
Why "knn_query" doesn’t have a separate k parameter? Elasticsearch vector-search	13	503	May 9, 2024
KNN search returns an empty result set when num_candidates is less than the filtered doc count Elasticsearch vector-search	10	212	October 18, 2024

The num_candidates parameter leads to some confusing query results

Related topics