Scoring in Exact and Phrase Matching


#1

Hi,

I'm fairly new to Elasticsearch and Lucene. I quickly went through the definitive guide and was able to understand how the scoring is calculated for boolean, term and multi term queries. The basic weighting is TF-IDF and scoring is based on custom VSM. Depending on query construction finalqueryscore = (booleanqueryscore + termscore1+ termscore1.....) where booleanquery, termscores are based on custom VSM.

However, I'm not very clear on what kind of scoring is used for exact and phrase matching ? For exact match, is the score always 1 ? Similar to above, is phrasequeryscore = booleanqueryscore + termscore1+ proximity(Edit Distance)..... ?

The only relevant information I found is "Individual queries may combine the TF/IDF score with other factors such as the term proximity in phrase queries, or term similarity in fuzzy queries [1]." How exactly is proximity combined ?

  1. https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-intro.html#explain

(Nik Everett) #2

I'm not sure of a better place to look that the implementation:
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/5.2.1/org/apache/lucene/search/SloppyPhraseScorer.java#71

Meaning, I don't remember seeing a place it was better documented. You still have to jump around into things like DocScorere#computeSlopFactor to get the full picture.


(system) #3