Compile error when using sparse vector and cosine similarity

I am trying to use the new functionality for vector fields, but I am running into problems. Let me break down the problem for you:

My documents are json files, with this structure:

{"name": "doc_name", "field_1": "doc_id", "field_2": "a_keyword", "text": "a rather long text", "embedding": {"4655": 0.040158602078116556, "4640": 0.040158602078116556}}

After creating my index, I am passing a mapping using:

curl -X GET "localhost:9200/tutorial/_mapping" -H 'Content-Type: application/json' -d'{
  "properties": {
    "name": {
      "type": "keyword"
    },
    "field_1": {
      "type": "keyword"
    },
    "field_2": {
      "type": "keyword"
    },
    "text": {
      "type": "text"
    },
    "embedding": {
      "type": "sparse_vector"
    }
  }
}'

Note that before I included this, Elasticsearch protested when uploading documents, since they had too many fields (each key in embedding was taken as a field called embedding.key). This is not happening now, so this part seemed to be working.

Note also that I always delete my index and start from scratch, just in case.

Then I am trying my query, directly taken from the documentation:

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d '{
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "cosineSimilaritySparse(params.queryVector, doc[embedding])",
        "params": {
          "queryVector": {"1703": 0.0261, "1698": 0.0261, "2283": 0.0459, "2263": 0.0523, "3741": 0.0349}
        }
      }
    }
  }
}'

But I am getting an illegal_argument_exception with reason Variable [embedding] is not defined.

What am I doing wrong?

You need to add quotes around the field name: doc['embedding']. You should also add 1 to your cosine similarity measure to prevent the score to be negative like advised in this documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/master/query-dsl-script-score-query.html#vector-functions
cosineSimilaritySparse(params.queryVector, doc[embedding]) + 1.0

1 Like

Guess what? That solved it! I had taken and put back those quotes a thousand times to no effect, but it seems I hadn't tried it after I finally solved the mapping, which took me a lot of effort.

And I have taken the chance to add the one, as recommended. A thousand thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.