Bug while trying to compute cosineSimilarity in a for loop

Joao_Luiz · January 3, 2025, 4:01pm

Hello everyone! I'm currently working with Elasticsearch to be able to produce some features to use as input for my machine learning model. One of the features consists on a cosineSimilarity between a user's rating embeddings multiplied by it's associated score. My current query:

{
  "query": {
    "script_score": {
      "script": {
        "source": "double maxVal = 0.0; int index=0;
  for(int i = 0; i < params.ratings.length; i++) {
    double sim = cosineSimilarity(params.ratings[i]['embedding'], 'titleEmbedding') + 1;
    if (sim > maxVal) {
        maxVal = sim;
        index = i;
    }, return maxVal * params.ratings[index]['score']"
        "params": {"ratings":[
   {
    "embedding": [embedding A],
  "score": 25
   }...
]
        }
      }
    }
  }
}

As you can see, the idea is to capture the maxVal, or the max cosine similarity score, and multiply it by an associated rating score. That should be it. But for some reason, the logic is only performing the cosineSimilarity with the first embedding, so embedding in position 0 in the ratings, and then it multiplies by the nth associated score. That behaviour is wrong. Does someone knows what could be happening?

system · January 31, 2025, 4:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
cosineSimilarity: script score query returned an invalid score for doc: Elasticsearch	3	3269	February 6, 2020
Using cosineSimilarity function inside aggregation scripts Elasticsearch	3	636	August 9, 2022
ScriptEngine - ScoreScript : cosine similarity Elasticsearch	2	1057	January 24, 2019
Script_score query with cosineSimularity on alias Elasticsearch	1	205	January 17, 2023
Script Score Query Cosine Similarity Elasticsearch	4	2211	August 8, 2019

Bug while trying to compute cosineSimilarity in a for loop

Related topics