Elastic KNN search questions

BenTrent · October 9, 2023, 6:58pm

How is the 10,000 candidates chosen?

num_candidates is the same idea as efSearch. Its the number of candidates we continue to keep track of while searching the HNSW graph per shard. This number is applied per shard.

If we have 1M vector documents, to search across all of these should we pick like 100 shards, so that each shard has max 10k documents? We need very pretty good recall on the retrieved results.

I would say not. 1M vectors should fit in a single shard. HNSW is really good at providing high recall even in larger graphs.

So how do we pick shard sizes when we have this limit of 10k candidates per shards for KNN?

I would say you shouldn't.

If you are wanting 100% recall, then you probably don't want to index the vectors at all and just use brute-force. But keep in mind this scales linearly, where HNSW scales logarithmically and provides much faster query speeds.

Topic		Replies	Views
Semantic search on more than 10k documents Elasticsearch vector-search	4	828	October 18, 2023
The num_candidates parameter leads to some confusing query results Elasticsearch vector-search	7	378	June 9, 2024
Why "knn_query" doesn’t have a separate k parameter? Elasticsearch vector-search	13	717	May 9, 2024
KNN _score not lining up with similarity filter Elasticsearch vector-search	4	185	March 24, 2025
How to handle "ef" and "num_candidates" parameters in hnsw search Elasticsearch docker , vector-search	2	3704	December 8, 2022

Elastic KNN search questions

Related topics