Vector space model reflected in practical scoring function


Hi, I'm new to ElasticSearch.

I read that ElasticSearch combines the boolean and the vector space
model at the following page.

I know that one can measure the angle between the query vector and the document vector in order to assign a relevance score to each document. I believe this formula is also called the cosine
similarity, right?

However, if I inspect the practical scoring function I don't see how the cosine similarity is present.

I believe the summation of tfidf^2boost*norm part for each term
represents dot product of the cosine similarity.

The queryNorm(q) must then represents the denominator. However the formula for the queryNorm seems to be 1/sqrt(sumOfSquaredWeights), where the sumOfSquaredWeights is calculated by adding together the IDF of each term in the query, squared. However this will only consider the terms present in the query. In the formula of the cosine similarity it also takes into account the terms in the document.

The cosine similarity returns a value between 0-1 (where 1 is a 100% match), why is this not the case
for ElasticSearch? How does it then use the vector space model?

Thanks in advance,

kind regards


(system) #2