Fuzzy queries relevance score detailed explanation

full_vlad · January 19, 2016, 9:06pm

In what step of the relevance scoring phase do fuzzy-queries apply the Levenstein formula?

I am asking this because I read here that the steps for relevance scoring include TF-IDF, vector space model and other features like a coordination factor, field length normalization, and term or query clause boosting.

Where exactly does applying Levenstein (or Damerau-Levenstein) occur and most importantly, where does the fuzziness come from? What is actually fuzzy about fuzzy queries? Is it related to fuzzy logic in any way?

Thanks in advance!

Mark_Harwood · January 20, 2016, 10:56am

Fuzzy queries take a single user-provided term and produce several Lucene TermQuery variants, each of which are boosted with a score that reflects the edit distance (the boost for a non-fuzzy query term is usually 1.0 i.e. no boosting effect.). This used to be mixed in with the usual Lucene IDF ranking but to ill effect [1]. Modern versions of fuzzy query now "lie" about document frequencies of the auto-expanded term variants to prevent IDF issues like this one linked.

[1] When searching for 'Boss' with fuzziness, get higher score for 'Bose' than 'Boss'. ? How Comes !?!?

Topic		Replies	Views
Fuzzy query scoring based on levenshtein distance Elasticsearch	4	2680	July 6, 2017
Elasticsearch Fuzzy Query Elasticsearch	1	329	April 1, 2019
Is fuzzy query in elasticsearch related to fuzzy logic? Elasticsearch	3	769	July 5, 2017
Help to understand fuzzy score Elasticsearch	3	10	November 21, 2024
Fuzziness & score computation Elasticsearch	2	5844	July 6, 2017

Fuzzy queries relevance score detailed explanation

Related topics