Cutoff Frequency alternatives

Hello,

I'm using Elasticsearch as a search engine for several web applications. I was using cutoff_frequency to ensure good relevance in different cases :

  • for most of our query, it's used for deleveraging high frequency terms score but keeping them into account
  • "pure stopwords query" for queries containing only high frequency terms.

By doing this, we do not have to rely on any list of stopwords and the high frequency terms are automatically detected.

We had good results for years using this implementations, but I just noticed that cutoff_frequency has been deprecated in 7.3.0 :frowning:

From now it's unclear if this mechanism has been replaced by something. From everything that I understand from the github repo and the documentation, cutoff_frequency has been replaced by "magic" and everything is working good "out of the box". As it's described by the deprecation message : "you can omit this option, the [multi_match] query can skip block of documents efficiently if the total number of hits is not tracked".

There is also some statements which are unclear for me about the fact that everything relies magically on the "max_score" now as stated here : https://github.com/elastic/elasticsearch/issues/37096

" However in Elasticsearch 7 we have another alternative that can automatically skip non-competitive documents based on the maximum score of each term in the query. This new method doesn't require any configuration (no cutoff frequency) and can be much faster than the common_terms alternative if the total number of hits that matches the query is not tracked."

It's unclear for me about how this "thing" is supposed to work. If somebody can provide additional insights about this, it would be very nice.

So far so good, and base on this non-exhaustive knowledge about how things are going since 7.3, I'm in a case where :

  • we always track the total number of hits (because we are drawing a pagination based on search results)
  • we always have a max_score:null because we do not rely only on _score for sorting, but also on some documents fields that can take precedence on the _score (it's all about pinning some documents at the top).

So my concern here is : considering these 2 points, is it safe to upgrade to 7.3 and get rid of all the previously existing cutoff_frequency usage, without losing relevancy in search results ?

Regards

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.