Can Elasticsearch automatically normalize vectors for dot_product?

mmaccou · December 12, 2023, 1:16pm

Can Elasticsearch automatically normalize vectors for dot_product? I haven't found it the docs yet, but wanted to double check here before I manually add the step in my code.

BenTrent · December 12, 2023, 1:35pm

Hey @mmaccou ,

We do not auto-normalize the vectors.

In 8.12, cosine will be updated so that its performance will be improved like dot-product for typical usage: https://github.com/elastic/elasticsearch/pull/99445

mmaccou · December 12, 2023, 1:50pm

Oh perfect! Is there a release date for 8.12? Will the performance match dot_product, or will it just be improved, but not quite matched?

Also, a follow-up question since this is new to me, when I bulk index content into Elasticsearch using an inference pipeline with dot_product similarity, is there a way to normalize the vector first? It's not clear how that would even be possible using the inference pipeline.

Serena_Chou · December 12, 2023, 4:17pm

Hey there , we can't share exact release dates for this exciting functionality yet - but rest assured it is actively in the works and will be available soon. I suggest that for more updates you can subscribe to our more technical site with your RSS feed so you can see the stuff that will be trickling into 8.12 - All — Elastic Search Labs for instance the fastest Apache Lucene release ever. Apache Lucene 9.9, the fastest Lucene release ever — Elastic Search Labs

mmaccou · December 12, 2023, 4:40pm

Hi Serena. Thanks for directing me to the other site. Can you also take a look at my follow up question? I'm trying to figure out the right course of action until 8.12 is released.

When I bulk index content into Elasticsearch using an inference pipeline with dot_product similarity, is there a way to normalize the vector first? It's not clear how that would even be possible using the inference pipeline.
Once 8.12 is released, will I have to re-index everything to take advantage of the the cosine updates?

Serena_Chou · December 14, 2023, 11:09pm

Yeah inference processor won't do it for you. Our great SAs at Elastic actually have a few examples that I stole here:

import numpy as np

def normalize_vector(vector):
... magnitude = np.linalg.norm(vector)
... if magnitude == 0:
... return vector
... normalized_vector = vector / magnitude
... return normalized_vector
...

test

vector = np.array([3, 4, 5])
normalized_vector = normalize_vector(vector)
print("Original Vector:", vector)
Original Vector: [3 4 5]
print("Normalized Vector:", normalized_vector)
Normalized Vector: [0.42426407 0.56568542 0.70710678]

mmaccou · December 15, 2023, 12:51am

Thank you for sharing this example! I appreciate it. This addresses the first question, right?
For the second, when 8.12 is released and cosine similarity is improved, will I need to reindex all of my data?

Serena_Chou · December 19, 2023, 10:01pm

Yes, that code is Python btw, and should get you started for normalizing vectors. You aren't able to update an existing index's similarity setting so you'd want to pick which one you want to go with cosine or dot_product. If you'd like to switch to cosine later, you'd have to index to a new index with your chosen similarity setting.

system · January 16, 2024, 10:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is it possible to do vectors dot-product and fetch results with top score? Elasticsearch	1	787	July 6, 2017
Example of dot_product similarity on dense_vector field index document Elasticsearch vector-search	3	1653	June 5, 2023
About the dense vector compression Elasticsearch vector-search	2	1097	December 20, 2022
Storage issues in es vectorized retrieval Elasticsearch	4	33	October 8, 2024
Dot product in Elastic Search Elasticsearch	2	2247	July 5, 2017

Can Elasticsearch automatically normalize vectors for dot_product?

test

Related topics