Allow vector functions in script fields context

Hi!

I use for a very specific use case script fields for scoring different attributes of documents using custom painless scripts, where the query context is used for filtering non-relevant documents (in the example below replaced with a match_all query for simplicity).

Here is an example:

GET my_index/_search
{
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "attribute_1": {
      "script": {
        "lang": "painless",
        "source": "cosineSimilarity(params.query_vector, 'dense_vector') + 1.0",
        "params": {
          "query_vector": [ ... ]
        }
      }
    },
    "attribute_2": {
      "script": {
        "lang": "painless",
        "source": "...",
        "params": {
          "some_param":  ... 
        }
      }
    }
  }
}

This example does not work because cosineSimilarity isn't allowed in the script_fields context. The following error is returned:

...
"caused_by" : {
    "type" : "illegal_argument_exception",
    "reason" : "Unknown call [cosineSimilarity] with [2] arguments."
}

This makes it impossible to use vector embeddings (e.g. sentence embeddings) to calculate the similarity regarding different attributes and return these scores in the results.

Now my question: Why isn't this possible? I couldn't find a legitimate reason. Also: is there a workaround to get multiple scores for each hit using vector function?

Indeed vector functions are only available in ScoreScript context. These functions are intended to be used for scoring documents. And the use-case you presented here is quite original. We have a relevant issue of exposing vector values in scripts, I can also add your request for something that we will consider.

is there a workaround to get multiple scores for each hit using vector function

No, you can either combine vector functions' outputs from multiple fields in a single painless score. Or another alternative is to issue multiple queries.

Hi and thanks for answering!

Yes, my use-case is probably more exotic than most but I think Elasticsearch is perfect for something like this. Matching/Filtering entities and also retrieving some sort of evaluation (in my case scores between 0 and 1 of different attributes) isn't to far off. Sometimes you need/want additional information about the hits.

We can take Tinder as an example since they also use Elasticsearch and it fits my use-case since I also use Elasticsearch for matching entities. If you want to show additional information about matches in the UI (e.g. how well a specific attribute matches), script fields is a nice way to achieve this. Since Machine Learning becomes more common every day vector functions like cosine similarity get used more frequently in something like this. Allowing to use these in the script fields context makes sense in my opinion. Maybe there would be a downside I'm missing though.

You mentioned multiple queries as an alternative: This can be a solution but when response times become an important factor (e.g. Tinder) making multiple requests might take too long.

I don't quite get how this would work. How would I be able to get scores from multiple fields using vector functions and be able to separate them afterwords?

I also read the issue on github. If you'd like I can take part of the discussion on there if it's easier for you guys.

I don't quite get how this would work. How would I be able to get scores from multiple fields using vector functions and be able to separate them afterwords?

You are right, it is not possible to separate individual scores – a script allows you to combine output from multiple vector functions into a single score.

I also read the issue on github. If you'd like I can take part of the discussion on there if it's easier for you guys.

I have added your request to the github issue. Feel free to participate in it as well.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.