Use distance on dense vectors in relevance score (at query time)

I use elasticsearch to combine different things:

  • search in text
  • score based on dense vector (cosine similarity)

I use a query with function_score. The first part is the search in the text (giving a score) and THEN a script is applied to compute cosine similarity.

My problem is that the cosine similarity is not computed during the query phase and my search in the text act as a pre-filter. I will always obtain results linked with the text search even if the cosine similarity is better.

This is the standard behavior of function_score according the doc:

The function_score allows you to modify the score of documents that are retrieved by a query. This can be useful if, for example, a score function is computationally expensive and it is sufficient to compute the score on a filtered set of documents.

I would like to compute the cosine similarity at query time and this score will be combined with the text search (with as much importance).

Thanks !

You will find a gist here describing the problem with a "real" example.

Note: this post is also on stackoverflow

Hi there!
Did I understand your intention correctly: you want to go through all documents, and apply cosine similarity function to them. Then you also have a query and for the documents that match a query, you want to calculate score for this query. Then you want to combine these two scores: from a cosine similarity and a query?
Currently, you can do that with a bool query using should clauses like this:

GET my_index/_search
{
  "query": {
      "bool": {
        "should" : [
          {
            "match": {
                "my_text": {
                    "query": "abc"
                }
            }
          },
          {
            "script_score" : {
              "query" : {"match_all" : {}},
              "script" : {
                "source": "50 * cosineSimilarity(params.query_vector, doc['my_vector']) + 1.0",
                "params": {
                    "query_vector": [0, 0, 1]
                }
              }
            }
          }
        ]
      }  
  }
}

This will give you a sum of scores: score1 + score2. You can also apply boost for any query.
We also have a plan to develop a compound query that will give you an option to combine scores of queries not only through sum option. But this is not available yet.

Thanks you, it was exactly what I was looking for. I didn't now that I could combine queries like this.

If you want to copy your anser on stackoverlow :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.