Can Elasticsearch automatically normalize vectors for dot_product? I haven't found it the docs yet, but wanted to double check here before I manually add the step in my code.
Hey @mmaccou ,
We do not auto-normalize the vectors.
In 8.12, cosine will be updated so that its performance will be improved like dot-product for typical usage: https://github.com/elastic/elasticsearch/pull/99445
Oh perfect! Is there a release date for 8.12? Will the performance match dot_product, or will it just be improved, but not quite matched?
Also, a follow-up question since this is new to me, when I bulk index content into Elasticsearch using an inference pipeline with dot_product similarity, is there a way to normalize the vector first? It's not clear how that would even be possible using the inference pipeline.
Hey there , we can't share exact release dates for this exciting functionality yet - but rest assured it is actively in the works and will be available soon. I suggest that for more updates you can subscribe to our more technical site with your RSS feed so you can see the stuff that will be trickling into 8.12 - All — Elastic Search Labs for instance the fastest Apache Lucene release ever. Apache Lucene 9.9, the fastest Lucene release ever — Elastic Search Labs
Hi Serena. Thanks for directing me to the other site. Can you also take a look at my follow up question? I'm trying to figure out the right course of action until 8.12 is released.
When I bulk index content into Elasticsearch using an inference pipeline with dot_product similarity, is there a way to normalize the vector first? It's not clear how that would even be possible using the inference pipeline.
Once 8.12 is released, will I have to re-index everything to take advantage of the the cosine updates?
Yeah inference processor won't do it for you. Our great SAs at Elastic actually have a few examples that I stole here:
import numpy as np
... magnitude = np.linalg.norm(vector)
... if magnitude == 0:
... return vector
... normalized_vector = vector / magnitude
... return normalized_vector
vector = np.array([3, 4, 5])
normalized_vector = normalize_vector(vector)
print("Original Vector:", vector)
Original Vector: [3 4 5]
print("Normalized Vector:", normalized_vector)
Normalized Vector: [0.42426407 0.56568542 0.70710678]
Thank you for sharing this example! I appreciate it. This addresses the first question, right?
For the second, when 8.12 is released and cosine similarity is improved, will I need to reindex all of my data?
Yes, that code is Python btw, and should get you started for normalizing vectors. You aren't able to update an existing index's similarity setting so you'd want to pick which one you want to go with cosine or dot_product. If you'd like to switch to cosine later, you'd have to index to a new index with your chosen similarity setting.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.