Surprised by deprecation of common_terms query. What about its relevance features?

softwaredoug · April 16, 2021, 7:10pm

Hi all,

I was actually excited to dig into an experiment at work using the common terms query. I see however, it's been deprecated as the performance benefits are seen now in the match query, as mentioned in this issue.

However, this query provides relevance functionality I don't think I can get in a match query (please correct me if I'm wrong). And the reason for deprecation seems to relate to max score / block WAND.

I've never understood common_terms to be about speed, but more about relevance. I was surprised this didn't come up in the deprecation discussions. Specifically, what I like about it is to be able to set a cutoff frequency, and make low or high value terms mandatory (default AND) or not mandatory (default OR) based on document frequency.

Am I missing where I can do this with other functionality now?

joshdevins · April 19, 2021, 10:03am

Do you think the same effect can be achieved by tuning the BM25 parameters k1 and b? With or without minimum_should_match tuning?

Paging Dr. @jimczi for more advice and perhaps historical context.

softwaredoug · April 19, 2021, 1:00pm

Thanks @joshdevins -> yeah optimizing BM25 and min-should-match params have been a big focus of our work. However, with min should match, we can't choose which tokens to remove. It just provides a hard floor on number of tokens. For example in a product search, if someone searches for "blue suede jacket" I'd prefer if "blue" was made optional (presumably high DF) but "suede" and "jacket" mandatory. So for the next round of experimentation, I had hoped to turn to common_terms as opposed to maintaining or computing a specific list of these low value terms outside the search engine.

jimczi · April 19, 2021, 5:17pm

I always looked at this from a different angle. The question is not whether a term should be mandatory or not but rather how much it contributes to the overall score. The dance with minimum_should_match, common_terms and all these advanced options is there to minimize the impact of always searching all terms. So when WAND was introduced, I thought that it could be a chance to simplify things further. Users can opt for the default behavior of considering all terms optional and at the same time rely on internal optimizations to ensure that we'll not consider all documents eagerly.
I always found minimum_should_match and common_terms difficult to approach. Finding the right configuration is tricky and whatever you find must be updated as the data and queries evolve. So that's a lot of burden for users that "just" want search to surface the most relevant results automatically.

softwaredoug · April 20, 2021, 2:21pm

Thanks @jimczi - appreciate the response

The specific case I'm working on is if you search for 'blue suede jacket', and let's say there's only 2 exact matches, you'll see 'blue AND suede AND jacket' matches. If below that two, there's spurious matches on blue or suede, then even above the fold, you'll see some irrelevant results.

I was hoping to use common_terms to be a little smarter about making some terms mandatory (like jacket) and others optional (blue, suede, perhaps), so some of these lower down results would drop off and not be shown at all.

Of course if there's other ideas on how to tamp down the recall a bit, I'd be open to it... I know I could reissue the query, relaxing it strategically, or do a bit more before the query hits Elasticsearch. But I don't have access to doc freq and other useful index stats in a search service which would be useful in making this decision.

Hope that context makes sense. Anyway, it's up to you guys whether you want to deprecate it, I just wanted to share perhaps one use case where common_terms seems to help with relevance!

Best!

system · May 18, 2021, 2:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why match query and common term query acts different? Elasticsearch	1	438	November 30, 2018
Using Cutoff Frequency in a Multi-Match Query Causes Irrelevant Results Elasticsearch	2	553	February 14, 2019
Cutoff Frequency alternatives Elasticsearch	1	1353	March 26, 2020
Is a stopword filter still useful in the newer versions of elasticsearch? Elasticsearch	1	355	December 27, 2019
What is the difference between common terms query vs match query with cutoff_frequency set Elasticsearch	3	923	July 6, 2017

Surprised by deprecation of common_terms query. What about its relevance features?

Related topics