KNN _score not lining up with similarity filter

Yakob · March 24, 2025, 3:20pm

Thank you so much for your replies Alex and John! I apologize for my delayed response; I was out of town for a few days last week. To answer your questions:

First, I was wondering why you are using k=10000 and num_candidates=10000. Could you clarify?

The index I am working with has 1,500,000 documents and the product desire is to grab as many sufficiently relevant documents as possible. From spot checking, I have found that a similarity of 0.8 seems to be the sweet spot where returned documents are still relevant enough for my app's purpose. Most search vectors will not hit a full 10k results, but it does happen sometimes where 10k relevant results are expected and correct. The app I'm working on runs this KNN query to build a pool of possible final documents, which is then further refined later in the app (I can't use a filtered KNN search because of the issues outlined in my other post here: Efficient Subquery Combinations). As an aside, my team is considering increasing index.max_result_window to 20k as the 10k limit will occasionally keep us from grabbing a few desired documents.

Reading through the docs, use of max_size k and num_candidates does seem to be an anti-pattern. I haven't done a lot of vector math up to this point, so I will research the HNSW algorithm so I can better understand those minutiae.

So it should follow this formula: (2 * _score) - 1
Or in your case: (2 * 0.95) -1 = 0.8500066
Which I think makes sense based on what you described: that what you are seeing the document show up when you have 0.85 similarity as your threshold. I would expect anything above that and it will not return including 0.90.

Yes, you are correct! I had already read the KNN document you linked, but I missed that formula. I ran a few more test queries and all of the responses follow the expected values when using the formula:

_score = (similarity + 1) / 2

Thanks for clearing this up for me! This answers my original question, but I'm happy to learn more about KNN or answer any other questions you two have.

Topic		Replies	Views
Difference between KNN similarity and document score Elastic Search elastic-app-search	2	196	March 27, 2024
The num_candidates parameter leads to some confusing query results Elasticsearch vector-search	7	332	June 9, 2024
Question about knn query on nested field and similarity parameter Elasticsearch vector-search	7	679	July 19, 2024
Why "knn_query" doesn’t have a separate k parameter? Elasticsearch vector-search	13	667	May 9, 2024
Similarity field in KNN Elasticsearch vector-search	5	1025	January 1, 2024

KNN _score not lining up with similarity filter

Related topics