Accuracy of the scores and rankings in ANN output

rajivhs · June 14, 2023, 6:34pm

As per the ES8 documentation, when running an ANN query with k=100, Elastic will:

Find the 100 closest neighbors, using the HNSW approximation
Calculate each result's similarity and rank them using the specified similarity algorithm (eg, dot_product)

I understand that by design, ANN uses approximations in step 1. So the 100 results aren't guaranteed to be the 100 nearest neighbors. The recall metric in the benchmarks reflect the percentage of the nearest neighbors that are returned by ANN, and this is unlikely to be 100%.

However, are approximations used in step 2 as well? Within a set of 100 results, are they guaranteed to have accurate rankings and similarity scores (eg, using precise dot_product calculations)?

BenTrent · June 14, 2023, 7:37pm

Hey @rajivhs ,

The dot_product calculation is not approximate. As vectors are stored, the graph is built with the actual dot_product calculation and when the graph is explored, we calculate the actual dot_product between vectors.

Topic		Replies	Views
KNN _score not lining up with similarity filter Elasticsearch vector-search	4	212	March 24, 2025
Calculate k Nearest Neighbours using Cosine Similarity Elasticsearch	0	747	January 23, 2018
Searching with Dense Vector Elasticsearch	3	3478	January 3, 2020
Aproximate Nearest Neighbours python with leastic 8.8 Elasticsearch vector-search	2	453	June 25, 2023
Can we use Scann for vector similarity in elasticsearch? Elasticsearch vector-search	2	540	April 6, 2023

Accuracy of the scores and rankings in ANN output

Related topics