Example of dot_product similarity on dense_vector field index document

Hi,

I'm trying to benchmark different possibilities with dense_vector and knn search.

I want to test cosine similarity vs dot_produce similarity because in documentation says that dot_product is a optimized way to perform cosine similarity.

When I try to index a document with dot_product a faced an error that I'm not sure the meaning of.

The [dot_product] similarity can only be used with unit-length vectors. Preview of invalid vector:

In the documentation of dot_product mention something about unit-length but I'm not sure about what it about. I didn't find any example about indexing documents with dot_product similarity.

My example is:

PUT test
{
  "settings": {
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "vector": {
        "type": "dense_vector",
        "dims": 2,
        "index": true,
        "similarity": "dot_product",
        "index_options": {
          "type": "hnsw",
          "m": 16,
          "ef_construction": 100
        }
      }
    }
  }
}

PUT test/_doc/1
{
  "vector": [1.0, 2.0] # I tested with [1.0, 2.0] or [1, 2]
}

And response:

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "The [dot_product] similarity can only be used with unit-length vectors. Preview of invalid vector: [1.0, 2.0]"
    }
  },
  "status": 400
}

This example with cosine similarity works as expected.

Could you give me any info about that?

Thanks in advance.

Adrian.

I think I already found the solution.
We need to normalize the vector before to index.

I used this code to achieve that:

public static double[] normalizeVector(double[] vector) {
    double[] normalizedVector = new double[vector.length];
    double magnitude = 0.0;
    for (int i = 0; i < vector.length; i++) {
        magnitude += Math.pow(vector[i], 2);
    }
    magnitude = Math.sqrt(magnitude);
    for (int i = 0; i < vector.length; i++) {
        normalizedVector[i] = vector[i] / magnitude;
    }
    return normalizedVector;
}

And now the vector [1.0, 2.0] give this results after normalize [0.4472136, 0.89442719].

So this index request works fine.

PUT test/_doc/1
{
  "vector": [0.4472136, 0.89442719]
}
3 Likes

Thanks heaps for sharing your solution!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.