Why "knn_query" doesn’t have a separate k parameter?

Thanks elastic team, for adding knn_query. Now with knn_query it is possible to use function_score or script_score and influence the rank of vector search result and not solely rely on the cosine similarity score. This was not possible using top_level_knn_section.

However, I have a question regarding the design of knn_query: Why is there no separate k parameter? Managing kNN searches without a distinct k parameter seems both inconvenient and potentially less accurate.

Drawbacks of Coupling k with size:

  1. Vector Search Systems: For pure vector_search system if I want to retieve k results and using size and from parameter to perform pagination. I have to set my num_candidates and number_of_shards such that num_candidates*number_of_shards is close (or equal) to k. This setup disrupts the typical operation of HNSW, where num_candidates (equivalent to efSearch) is kept higher than k to enhance the accuracy of vector matches. This (num_candidates>k) setting compensates for potential accuracy losses due to the greedy and local nature of node traversal (during search operation) in HNSW.

  2. Hybrid Search Systems: In most systems I've observed, there's a limit on the maximum vector matching candidates, controlled by k. Pagination is then handled with the from and size parameters. But, with k parameter coupled with size parameter, then as explained in above point, we need to use num_candidates and number_of_shards as a proxy to limit the vector matching candidates.

  3. Aggregation Results: As documented, the final results from aggregations contain num_candidates * number_of_shards documents, which may not be ideal.

The potential for knn_query to enhance searches with function_score, script_score, and sub-searches is significant. I'm curious about the rationale behind not supporting a separate k parameter in knn_query, especially since we are already collecting a "num_candidates" number of results from each shard. It seems feasible for the coordinator or manager node to simply prune the list to k.

Could you provide some insights into this design choice?

1 Like

Thanks for your interesting questions and providing your insights.

  • The decision to use size for knn query comes with how Elasticsearch queries are organized. They all return max size number of results. And knn query being one of them, operates the same way – returns size results.

  • Not sure I understood your point about pagination (why k needs to be close to num_candidates*number_of_shards ?) . I can see how deeper pagination, you need to increase num_candidates to make sure its' value > from + size. Overall pagination with knn search is tricky, and the best way to implement it to retrieve all results you want and do pagination on your application level.

  • About hybrid search, that's a fair point to allow the control of vector matching candidates. But here control can be done with different ways: 1) boosting 2)applying minimum similarity parameter for knn query . Are these controls not enough?

  • Aggregation results, that's a fair point, and if your aggregations results need to reflect the precise counts of documents that are returned in a request that you can use the top level knn search. Very often we found that it is not needed though.

Thanks for looking into it @mayya :pray:

About hybrid search, that's a fair point to allow the control of vector matching candidates. But here control can be done with different ways: 1) boosting 2)applying minimum similarity parameter for knn query . Are these controls not enough?

We do have boosting and similarity threshold parameter. The good thing about vector similarity score (in our case cosine similarity) that, unlike BM25 score, its value reflects absolute degree of relevance. However, tuning similarity score is bit tricky because, it is very dependent on underline distribution of <query, document> score. Since in vector space, everyone is neighbor of everyone, and has a hedge, I believe that we should also restrict the vector matching candidates using k

(Above is my main concern, Below points are my secondary concern)

Not sure I understood your point about pagination (why k needs to be close to num_candidates*number_of_shards ?).

What I mean by this, suppose the business requirement is we only want to fetch at max "k" vector matching candidates. The only way to do this in knn-query is by adjusting num_candidates and number_of_shards. Such that num_candidates*number_of_shards is equal to k. (Strictly talking from HNSW perspective, using num_candidates without k could lead to drop in accuracy of matching candidates, let me know if you would like me to elaborate more on this).

The decision to use size for knn query comes with how Elasticsearch queries are organized. They all return max size number of results. And knn query being one of them, operates the same way – returns size results.

This make sense. But I feel like in order to keep the process consistent, we are manipulating the intrinsic behavior of kNN search (whose job is to fetch k nearest neighbour). I think size and k could co-exist independently. Suppose I'm doing only knn-query, first we can retrieve k-nearest neighbor (lets call it results) and then we can return results[:size] document. In case of pagination (results[from: from+size]).

I can see how deeper pagination, you need to increase num_candidates to make sure its' value > from + size. Overall pagination with knn search is tricky, and the best way to implement it to retrieve all results you want and do pagination on your application level.

Pagination is important for e-commerce search. Doing it at the application level might not be feasible. Additionally, we would be foregoing all the optimizations involved in pagination done by elasticsearch team.

I believe that we should also restrict the vector matching candidates using k

Restricting vector matches to k size is a valid use-case and need. We can consider that, but remember the k size will still be per shard basis.

business requirement is we only want to fetch at max "k" vector matching candidates

You should not use knn_query for this then. Indeed, limiting num_candidates could lead to a drop of accuracy.

I think size and k could co-exist independently. Suppose I'm doing only knn-query, first we can retrieve k-nearest neighbor (lets call it results ) and then we can return results[:size] document. In case of pagination (results[from: from+size]).

I guess you can do pagination, if your k is big enough to cover all pages.

After taking k most similar results from each shard, would it further merges the result from all shard and then takes global top k matches? (similar to how it is done in top-level-knn-search)

To gather results, the kNN search API finds a `num_candidates` number of
approximate nearest neighbor candidates on each shard.
The search computes the similarity of these candidate vectors to the query
vector, selecting the `k` most similar results from each shard.
The search then merges the results from each shard to return the global top `k` nearest neighbors.

Isn't this is how majority of the hybrid search-system is designed, where we define max number of the vector matching candidates to be included. For example here is the wording of from knn-query-in-hybrid-search-doc

Knn query can be used as a part of hybrid search, where knn query is
combined with other lexical queries. For example, the query below finds
documents with title matching mountain lake, and combines them with
the top 10 documents that have the closest image vectors to the query_vector. 

Query:

POST my-image-index/_search
{
  "size" : 3,
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": {
              "query": "mountain lake",
              "boost": 1
            }
          }
        },
        {
          "knn": {
            "field": "image-vector",
            "query_vector": [-5, 9, -12],
            "num_candidates": 10,
            "boost": 2
          }
        }
      ]
    }
  }
}

In above example, even though it says:

combines them with the top 10 documents that have the closest image vectors to the query_vector.

But actually it is combining 10*number_of_shards documents. Here, if we could have provided support for k parameter, by setting its value as 10 we could have achieve what is mentioned in document.

1 Like

Indeed each shard combines its own 10 nearest vectors with 10 lexical matches to obtain the shard's 10 results. Then all shards send their 10 results to the coordinator to combine the results from all shards to obtain the global 10 results. So overall, indeed, 10*number_of_shards of nearest vectors were considered..

After taking k most similar results from each shard, would it further merges the result from all shard and then takes global top k matches?

For the standalone knn query, yes, that's what you get. But for hybrid search, that's may not be the case, you can get more that k nearest vectors is your size is big.

Will it still be useful to have a separate k parameter, if it is applicable only per shard base?

Here you mean shard collects results for each queries (eg: lexical, knn) and merges them first at shard level, and then sends them to coordinator node, which will merge each shard results and generates final result.

For examples, assume num_candidates=2 and number_of_shards=2:

Shard 1:

  • Lexical search results: [d1:10, d2:2] # each entry in the list is "document_id:score"

  • KNN search results (num_candidates=2): [d1:0.1, d3:0.5]

    Merged result for Shard 1: [d1:10.1, d2:2, d3: 0.5] # assuming combine mode is "sum"

Shard 2:

  • Lexical search results: [d5:100, d6:50]

  • KNN search results (num_candidates=2): [d7:0.1, d8:0.5]

    Merged result for Shard 2: [d5:100, d6:50, d7: 0.1, d8: 0.5]

Final Result:

  • Combined and sorted by the coordinator node: [d5:100, d6:50, d1:10.1, d2:2, d3: 0.5, d7: 0.1, d8: 0.5]

While for the top-level-knn it first finds a num_candidates number of
approximate nearest neighbor candidates on each shard. Then select the k best results from each shard. Then merges the results from each shard to return the global top k nearest neighbors.

Is my understanding correct here?

Yes, it is correct, as knn query operates on a shard level, even if we introduce k parameter, it will be applied on a shard level.
As in your example, even k=2, globally we got 4 knn results.


The top level knn search has an extra phase where it first collects the global k results, this ensures that globally we always get k results regardless of number of shards.

1 Like

Thanks for the explaining how merging works differently for knn-query and top-level-knn.

Coming back to

Will it still be useful to have a separate k parameter, if it is applicable only per shard base?

short response;

I think we should still keep support for separate k parameter. (even when at shard level). We should keep it for the very same reason why we have separate k parameter in top-level-knn. Because it not only helps during selecting top-k at coordinator node. But it is also use to select top-k within each shard.

not so short response;

For a moment lets keep Elasticsearch and sharding out of picture and lets see how search for most nearest-neighbours for given query_vector is performed in HNSW graph.

The HNSW graph consists of multiple layers (refer attached figure).

Step 1: The search begins by selecting a random entry point (current_vertex) at the highest layer.

Step 2: Calculate the distance between the query_vector and the current_vertex, as well as all neighbors of the current_vertex.

  • Case 1: If the closest vertex is a neighbor of the current_vertex, greedily select that vertex and make it the current_vertex. If the current layer is not lowest layer goto step-2, othwerwise goto step-3

  • Case 2: If the closest vertex is the current_vertex and it’s not the lowest layer, move down one level and goto Step 2. If the current layer is the lowest layer, proceed to Step 3.

Step 3: Ultimately, the current_vertex at the lowest layer is identified as the nearest neighbor for the given query_vector.

reason

During the whole search process, after randomly selecting entry point we employed “greedy” strategy to select next current_vertex. And at each step it is using only local information to make decision. Due to this the selected vertex might not be the closet neighbour to query_vector. To overcome this HNSW suggests following:

Instead of finding only one nearest neighbour on each layer, the efSearch (i.e num_candidates) closest nearest neighbours to the query_vector are found and each of these neighbours is used as the entry point on the next layer.

reference => link

Returning to the context of Elasticsearch, each shard has its own HSNW graph. Suppose the goal is to find the 10 nearest neighbors from each shard. We should use more than 10 entry points (i.e., num_candidates/efSearch should be > 10) to ensure more precise vector matches. To enable this, we should add support for k (which would be set to 10 in this case).

Thanks for your explanation. Good to know that k parameter on a shard level will still be useful, I will bring this to the team for discussion.

Currently efSearch is num_candidates, and k is size for knn query. For a standalone knn query, that's all you need. Only when you use hybrid search, I can see how it makes sense to have a separate k parameter.

Right, this is useful for hybrid search.
thanks @mayya :pray:. Will look forward to more updates on this.

Just to confirm that we are on same page:

So after introducing separate k parameter in knn-query, now we will have k*number_of_shards total vector matching candidates, and this is because lexical and knn merging is happening at shard level (unlike top-level-knn-section where it happens at coordinator level)

Also, I was thinking is elastic planning to introduce function_score/script_score in top-level-knn-section? So that ranking of vector matching candidates could be influenced by some business logic (eg: click, purchase rate, etc) and not just rank it on the basis of similarity.

I've created an issue for adding k parameter for knn query.

For the top level knn search, please submit a separate discuss topic or github issue.

2 Likes