KNN _score not lining up with similarity filter

john-wagster · March 24, 2025, 4:03pm

Thanks for clearing this up for me! This answers my original question, but I'm happy to learn more about KNN or answer any other questions you two have.

awesome!

Reading through the docs, use of max_size k and num_candidates does seem to be an anti-pattern. I haven't done a lot of vector math up to this point, so I will research the HNSW algorithm so I can better understand those minutiae.

if it helps at all happy to talk through this in a little more detail; I work on that part of the ES stack.

The app I'm working on runs this KNN query to build a pool of possible final documents, which is then further refined later in the app (I can't use a filtered KNN search because of the issues outlined in my other post here: Efficient Subquery Combinations).

I missed the original post you had created; apologies for that. I took a look briefly and would be curious where you've gotten with that query. We might be able to iterate here a bit and you can always request consulting services too. Kinda sounds like you've tried a good bit of stuff already though. And I'm also not sure about the rrf suggestion but I bet if you reached out to consulting or support they might (actually not sure) be willing to help you out like give you some free cycles to play around with rrf.

for reference pulled from your other post:

A' = A

B' = B - A

C' = C - (A + B)

D' = D - (A + B + C)

To get my high level thoughts (without having spent a ton of time thinking about the queries you had in the other post). For what it's worth initially num_candidates will impact the explored HNSW graph which means you're spending a lot more time exploring it. At some point explorations like if you bump up the limit past 10k will start to time out probably. If k and num_candidates are the same what you get back is all the results of the closet 10k in that HNSW graph, k becomes mostly irrelevant. If you really need all of those results back then that's probably the best you can do. If you can do multiple queries it might ultimately be more efficient. You might have tried this already but querying for A and then subsequently querying for B with a list of filter criteria that eliminates all docs from A but with smaller num_candidate lists may yield more efficient results (I'm honestly not sure, be curious to learn if that's the case and if I'm understanding what you are trying to do here). case 2 you mentioned in the other post as well seems like it could be a subsequent metadata only query too. Fun problem space nonetheless. But definitely seems like you'll have to play around here to get to an efficient query.

Topic		Replies	Views
Difference between KNN similarity and document score Elastic Search elastic-app-search	2	206	March 27, 2024
The num_candidates parameter leads to some confusing query results Elasticsearch vector-search	7	393	June 9, 2024
Question about knn query on nested field and similarity parameter Elasticsearch vector-search	7	728	July 19, 2024
Why "knn_query" doesn’t have a separate k parameter? Elasticsearch vector-search	13	726	May 9, 2024
Similarity field in KNN Elasticsearch vector-search	5	1050	January 1, 2024

KNN _score not lining up with similarity filter

Related topics