Exactly which documents are used for vector calculation

I have read some of elastic search blogs and documents to understand how the vector calculation works.
However it seems confusing for me about the range of documents that is used for the vector calculation.

https://www.elastic.co/guide/en/elasticsearch/reference/7.4/query-dsl-script-score-query.html#vector-functions

  1. If match_all option is not used, documents, that are used for the vector calculation, are chosen based on the "text query" calculation?
    Which means the term-based search first identifies a target document from all documents and a vector operation is done against this restricted target document?

  2. If match_all option is used, vector calculation is done for whole documents?

Hello ray1, your overall understanding is correct.

Currently, vectors can only be used for scoring through Painless script functions like cosineSimilarity and l2norm. The way to use these functions for scoring during a search is to use a script_score query. The script_score query wraps another query and provides a new score for every document it matches.

When the wrapped query is a match_all (as in the blog post), then all documents are returned and the vector calculation is performed against each one. However, the query could instead be a more restrictive one like term:

{
  "script_score": {
    "query": {"term": { "tags": "java" }},
    "script": {
      "source": "cosineSimilarity(params.query_vector, doc['title_vector']) + 1.0",
      "params": {"query_vector": query_vector}
    }
  }
}

With this query, we only consider documents with a tags field that contains java. The vector calculation only runs over this subset of documents.

1 Like

Thank you for your relpy

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.