Exactly which documents are used for vector calculation

ray1 · October 4, 2019, 2:53am

I have read some of elastic search blogs and documents to understand how the vector calculation works.
However it seems confusing for me about the range of documents that is used for the vector calculation.

https://www.elastic.co/guide/en/elasticsearch/reference/7.4/query-dsl-script-score-query.html#vector-functions

If match_all option is not used, documents, that are used for the vector calculation, are chosen based on the "text query" calculation?
Which means the term-based search first identifies a target document from all documents and a vector operation is done against this restricted target document?
If match_all option is used, vector calculation is done for whole documents?

Julie_Tibshirani · October 9, 2019, 4:43pm

Hello ray1, your overall understanding is correct.

Currently, vectors can only be used for scoring through Painless script functions like cosineSimilarity and l2norm. The way to use these functions for scoring during a search is to use a script_score query. The script_score query wraps another query and provides a new score for every document it matches.

When the wrapped query is a match_all (as in the blog post), then all documents are returned and the vector calculation is performed against each one. However, the query could instead be a more restrictive one like term:

{
  "script_score": {
    "query": {"term": { "tags": "java" }},
    "script": {
      "source": "cosineSimilarity(params.query_vector, doc['title_vector']) + 1.0",
      "params": {"query_vector": query_vector}
    }
  }
}

With this query, we only consider documents with a tags field that contains java. The vector calculation only runs over this subset of documents.

ray1 · October 15, 2019, 9:32am

Thank you for your relpy

system · November 12, 2019, 9:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How vector based text similarity works under the hood? Elasticsearch	4	775	July 15, 2020
Use distance on dense vectors in relevance score (at query time) Elasticsearch	3	2083	March 3, 2020
Is there a way to combine default BM25 score of Elasticsearch and Dense Vectors similarity Elasticsearch	3	2571	April 23, 2020
Vector-Based search using cosineSimilarity Elasticsearch	4	323	August 11, 2022
Allow vector functions in script fields context Elasticsearch	4	1346	October 19, 2020

Exactly which documents are used for vector calculation

Related topics