Weird behavior of dot_product similarity on dense_vector field

elaj · January 9, 2025, 3:27pm

Hello,

TLDR: What's the "indexation" difference between different similarities for dense_vector field?

I have an index with filed dense_vector defined with similarity: cosine.
Now, I want to experiment with similarity:dot_product.

For that, I added two additional fields to my mappings, which will be populated via the ingest pipeline.

Original field feature1_vector with similarity: cosine -> embeddings come from processor tag: inference 1 - calls custom trained ML model, hosted in ES
New field feature2_vector_inference with similarity: dot_product -> embeddings come from processor tag: inference 2, which is precisely the same as the previous one
New field feature3_vector_copied with similarity: dot_product -> embeddings come from set processor, which copies them from the first field feature1_vector

Mapping:

"feature1_vector": { -- ORIGINAL FIELD
          "type": "dense_vector",
          "dims": 384,
          "index": true,
          "similarity": "cosine",
          "index_options": {
            "type": "int8_hnsw",
            "m": 16,
            "ef_construction": 100
          }
        },
"feature2_vector_inference": { -- NEW FIELD TO BE POPULATED BY INFERENCE PROCESSOR
          "type": "dense_vector",
          "dims": 384,
          "index": true,
          "similarity": "dot_product",
          "index_options": {
            "type": "int8_hnsw",
            "m": 16,
            "ef_construction": 100
          }
        },
"feature3_vector_copied": { -- NEW FIELD TO BE POPULATED BY SET PROCESSOR
          "type": "dense_vector",
          "dims": 384,
          "index": true,
          "similarity": "dot_product",
          "index_options": {
            "type": "int8_hnsw",
            "m": 16,
            "ef_construction": 100
          }
        }

Ingest pipeline (shortened version):

[    {
      "inference": {
        "tag": "inference 1",
        "model_id": "candidate_a",
        "input_output": [
          {
            "input_field": "feature1",
            "output_field": "feature1_vector"
          }
        ],
        "ignore_failure": false,
        "on_failure": [...]
      }
    },
    {
      "inference": {
        "tag": "inference 2",
        "model_id": "candidate_a",
        "input_output": [
          {
            "input_field": "feature1",
            "output_field": "feature2_vector_inference"
          }
        ],
        "ignore_failure": false,
        "on_failure": [...]
      }
    },
    {
      "set": {
        "field": "feature3_vector_copied",
        "copy_from": "feature1_vector"
      }
    }]

After reindexing some documents, I see all three fields containing precisely the same embeddings (I reference to _source returned to my knn query).

My questions:

Is that expected?
If so, why can't I select similarity during query time but only in index time?
Copying embeddings from feature1_vector to feature3_vector_copied with different similarities shouldn't simply fail?

Maybe someone will shed some light
Thanks!

Kathleen_DeRusso · January 9, 2025, 4:08pm

Starting with 8.12, cosine automatically normalizes vectors, and the dot product calculation is used out of the box as a performance enhancement. You can read more details in the blog or the PR if you're interested.

elaj · January 10, 2025, 11:54am

Thank you for your reply.

May I have two follow-up questions?

Why keep both options cosine and dot_product for the dense_vector.similarity field if cosine internally uses dot_product for computing similarity (which is more efficient)? Are there other cases when defining cosine or dot_products makes actually a difference?
Would indexation be faster if I define explicitly in index similarity: dot_product?

Thank you

Kathleen_DeRusso · January 10, 2025, 1:10pm

We would not remove a GA feature without warning to maintain backward compatibility. I think you'd have to benchmark to see if you saw a real difference in your models.

Topic		Replies	Views
Example of dot_product similarity on dense_vector field index document Elasticsearch vector-search	3	1866	June 5, 2023
Failure on document_parsing_exception - dot_product similarity on dense_vector index field Elasticsearch vector-search	6	1668	December 11, 2023
Dot product in Elastic Search Elasticsearch	2	2310	July 5, 2017
Error cosine similarity Elasticsearch	1	680	March 9, 2021
Scalar product sort for vector types Elasticsearch	4	918	December 26, 2019

Weird behavior of dot_product similarity on dense_vector field

Related topics