Can minimum_should_match be 'boxed'

Is it possible to use minimum_should_match to control an overabundance of should clauses in a boolean query?

For context:
I made a query builder that allows a user to add multiple texts that get translated into individual should clauses in a boolean query. However, in some instances, this can become tens of should clauses. The query is run in a date order desc sort (no scoring) which tends to return 0 results when there are too many 'required' should clauses due to minimum_should_match.

The user is looking for any combination of the terms in a given query, but not necessarily all of the terms (i.e., the user is looking for any combination of "vanilla, chocolate, strawberry, butter pecan, cookie dough, rocky road, etc." where more is better, but the likelihood of finding results that contain all is not likely and matching on 1 or 2 may not be specific enough).

I am aware that minimum_should_match can support multiple conditional specifications. However, what I am looking for is a way to control an upper bound of minimum_should_match.

For example, I have used minimum_should_match: 1<50% which works well for many of the queries (in most cases, users enter less than ten items). This does a great job. However, if the number of should clauses grows too big, the results are zero (e.g., 50% of 20 should clauses are 'required').

I tried minimum_should_match: 1<50% 10<20% in an attempt to see if it is possible to box the should clauses to return reasonable results, but that appears not to work.

Any ideas or advice?

How about using score as the sort order, min_should_match of 1 but offering filters instead of sorting by date e.g filters “last week”, “last month” etc.? That’s how Google search handles dates.

The requirements are for date order descending. I provide a scored version as well, with date filtering, but in this case the requirement is to return the most recent items.

That’s a shame.
Implementing this sort-by-date requirement means you are having to invent some quality filters to hide results below some arbitrary threshold. This logic will not be easily explained or controlled by end users. Alternatively, sorting by score allows users to page through all results and control the date aspects with very self-explanatory filters.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.