cosineSimilarity: script score query returned an invalid score for doc:

I have a strange issue with the _search API. I saw other people in this forum which had similar issues and the problem for them was that they had negative values. However in my case I am using dense vectors. Moreover my script_score is trying to calculate the cosine and I thought that cosine was returning negative values, however I am adding 1.0 to the score to make sure that it will never be negative.

Just to make sure that my dense vector don't have any issues I tried to do the dot product manually and it works. But when I try to use the cosineSimilarity with ES I get the following error.

Elasticsearch version : 7.4.2
lucene_version: 8.2.0
java version: 1.8.0

I have an index called "my_index" where it contains a field called embeddings. Its datatype is the following:

"embeddings": {
    "type": "dense_vector",
    "dims": 300
},

Every singe item contains a vector and its not null

GET my_index/_search

{
  "query": {
    "script_score": {
      "query" : {
          "match_all": {}
        },
      "script": {
        "source": "cosineSimilarity(params.queryVector, doc['embeddings']) + 1.0",
        "params": {
          "queryVector": "[my_300_Dimenion_Vector]"
        }
      }
    }
  }
}

The error looks like this:

{
    "error": {
        "root_cause": [
            {
                "type": "exception",
                "reason": "script score query returned an invalid score: NaN for doc: 9"
            },
            {
                "type": "exception",
                "reason": "script score query returned an invalid score: NaN for doc: 178"
            },
            {
                "type": "exception",
                "reason": "script score query returned an invalid score: NaN for doc: 24"
            },
            {
                "type": "exception",
                "reason": "script score query returned an invalid score: NaN for doc: 44"
            },
            {
                "type": "exception",
                "reason": "script score query returned an invalid score: NaN for doc: 60"
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "webcontent_emb",
                "node": "eIrwXaCYRrWjjOdUJkW66A",
                "reason": {
                    "type": "exception",
                    "reason": "script score query returned an invalid score: NaN for doc: 9"
                }
            },
            {
                "shard": 1,
                "index": "webcontent_emb",
                "node": "aTC9SRxaRIK4gPdqCpEQ-w",
                "reason": {
                    "type": "exception",
                    "reason": "script score query returned an invalid score: NaN for doc: 178"
                }
            },
            {
                "shard": 2,
                "index": "webcontent_emb",
                "node": "eIrwXaCYRrWjjOdUJkW66A",
                "reason": {
                    "type": "exception",
                    "reason": "script score query returned an invalid score: NaN for doc: 24"
                }
            },
            {
                "shard": 3,
                "index": "webcontent_emb",
                "node": "aTC9SRxaRIK4gPdqCpEQ-w",
                "reason": {
                    "type": "exception",
                    "reason": "script score query returned an invalid score: NaN for doc: 44"
                }
            },
            {
                "shard": 4,
                "index": "webcontent_emb",
                "node": "jsHhvQSzRO-zkM--fH1JdA",
                "reason": {
                    "type": "exception",
                    "reason": "script score query returned an invalid score: NaN for doc: 60"
                }
            }
        ]
    },
    "status": 500
}
1 Like

The problem was based on the dense vector values.
It seems like I had vectors filled with zeros. Thats why cosineSimilarity was returning NaN values. After those entries were removed, cosine worked again

1 Like

Hello @lok63, thanks for raising this point. At first I thought it may be a bug, but the behavior actually seems correct to me -- cosine similarity is not defined if one of the vectors is all zeros. If you have any ideas for how we could better handle this case, feel free to open a GitHub issue to start a discussion.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.