Search over most frequent matches / terms without TF or IDF adjustment

pete4711 · September 18, 2015, 11:26am

Hi there,

we are working on a text-based search (via the famous "Type your search here" input box) that computes the score over multiple fields and shows the best results. It's basically a bool query with a mixture of "term" and "match" over many different fields (using fuzzyness, ngram, edge-ngrams and others).

We want the best results (being most "popular") to show up first (thus get the highest score). However the default TF-IDF algorithm of lucene gives us the exakt opposite. Image you look for a vendor that exists in 30% of all index entries. It will have a very high IDF and be ranked very low. We just want the exact opposite of that - give us the most frequent first(!).

Trying our best luck with the the "cross-field" query did not work out since we want to combine different query types with "bool".

Now, what we "found out" is that using Okapi BM25 with k1=0 and b=0 almost(?) behaves like a similarity that ignores TF/IDF. However we feel unsure if this really is the way to go.

Can you give us some feedback on that, please?

Is this the way to go or for our "problem" is there better waiting to be discovered?

Best regards
Peter

Topic		Replies	Views
Raw tf-idf Elasticsearch	6	1149	August 3, 2017
How to disable TF/IDF completely Elasticsearch	7	4719	April 10, 2018
Question about the future release of ES that incorporate Lucene 7.0 Elasticsearch	3	681	April 24, 2017
Tf-idf custom similarity and bm25 gives same scores and identical results along with a minor problem Elasticsearch	3	474	October 23, 2022
A question around to get relevant content By using TF-IDF algorithm Elasticsearch	1	242	November 9, 2021

Search over most frequent matches / terms without TF or IDF adjustment

Related topics