Similarity field in KNN

When using the Similarity field in KNN Query, The response is not expected, The similarity value I gave was 0.6 and the boost as 1. Even though there are more than 30 documents which have score greater than 0.7 only one document is returned.

If similarity field is not mentioned in the query, the documents that were returned are having more than 0.7 score

Query:
image

ES Version: 8.11

1 Like

@Sai_Krishna_D ,

What is the similarity function set up for your index?

Note: similarity in this case refers to the actual vector similarity, not the _score. The _score is transformed, applies boosts, etc.

For example, if you are using dot_product, using 0.6 is requiring the vectors have a dot_product of at least 0.6.

The document _score as it relates to dot_product is (1 + similarity) / 2.

If you consider documents with score 0.7 as adequately similar, you should set your similarity threshold to:

0.7 * 2 - 1 or 0.4.

@BenTrent
We are using the cosine similarity.
Anyway your answer has resolved my issue. Thanks for the quick response. Also can we please add this to the documentation as well to avoid this understanding gap.

Thanks and Regards
Sai Krishna D

2 Likes

@Sai_Krishna_D you are correct, our documentation here isn't great. I will update k-nearest neighbor (kNN) search | Elasticsearch Guide [8.11] | Elastic to explain the calculations and how to get the right similarity!

Thank you @BenTrent

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.