As per the ES8 documentation, when running an ANN query with k=100, Elastic will:
- Find the 100 closest neighbors, using the HNSW approximation
- Calculate each result's similarity and rank them using the specified similarity algorithm (eg,
dot_product
)
I understand that by design, ANN uses approximations in step 1. So the 100 results aren't guaranteed to be the 100 nearest neighbors. The recall
metric in the benchmarks reflect the percentage of the nearest neighbors that are returned by ANN, and this is unlikely to be 100%.
However, are approximations used in step 2 as well? Within a set of 100 results, are they guaranteed to have accurate rankings and similarity scores (eg, using precise dot_product
calculations)?