Minimum should match is not working

I have a query with an array of 'must' clauses. Two of those clauses have array of 'should' clauses in them as follows just providing pseuodocode here):

must: [
{ clause 1 },
{ clause 2 },
{ bool:
should: [
{clause3},
{clause4},
{clause5}
],
"minimum_should_match" : "1"
},
{ bool:
should: [
{clause6},
{clause7}
],
"minimum_should_match" : "1"
}
]

The requirement here is to build a series of AND conditions with inner ORs as below:
clause1 AND clasue2 AND (clause3 OR clause4 OR clause5) AND (clause6 OR clause7)

I m using Java High Level Rest Client BoolQueryBuilder (version 7.1.1).
When I set minimumShouldMatch(1), it builds it as a String value "minimum_should_match" : "1"(as shown above). My intention of providing it is to say that atleast one clause must match in each of the should clauses.

But, I m seeing some records in the results where both clause6 and clause7 do not match. Meaning these records just satisfy 'clause1 AND clasue2 AND (clause3 OR clause4 OR clause5)'

It looks like minimum should match set as 1 for each of the should clauses are not respected.

Can you please advise?

Can anyone help please? Is this an inherent bug?

Could you provide some authentic sample data to reproduce as you said ?

I have simplified the several clauses and kept the problem clauses alone in my explanation above to give you the following details.
Basically, in the following query, I m trying to identify documents whose 'code' are in the 5 values provided in 'terms' and they have either data matching category_1 or category_2 conditions.
What we observe is that the date range queries for these categories are not respected well. Data returned are having documents that do not fall in the specified date ranges.

{
 "bool" : {
  "must" : [
   {
    "terms" : {
     "code" : [
      "xxxxxxxx1",
      "xxxxxxxx2",
      "xxxxxxxx3",
      "xxxxxxxx4",
      "xxxxxxxx5"
     ],
     "boost" : 1.0
    }
   },
   {
    "bool" : {
     "should" : [
      {
       "bool" : {
        "must" : [
         {
          "exists" : {
           "field" : "category_1.category_1_score",
           "boost" : 1.0
          }
         },
         {
          "range" : {
           "category_1.category_1_date" : {
            "from" : "2019-05-26T00:00:00.000Z",
            "to" : null,
            "include_lower" : true,
            "include_upper" : false,
            "time_zone" : "UTC",
            "boost" : 1.0
           }
          }
         },
         {
          "range" : {
           "category_1.category_1_date" : {
            "from" : null,
            "to" : "2019-07-25T00:00:00.000Z",
            "include_lower" : false,
            "include_upper" : true,
            "time_zone" : "UTC",
            "boost" : 1.0
           }
          }
         }
        ],
        "adjust_pure_negative" : true,
        "boost" : 1.0
       }
      },
      {
       "bool" : {
        "must" : [
         {
          "exists" : {
           "field" : "category_2.category_2_score",
           "boost" : 1.0
          }
         },
         {
          "range" : {
           "category_2.category_2_date" : {
            "from" : "2019-05-26T00:00:00.000Z",
            "to" : null,
            "include_lower" : true,
            "include_upper" : false,
            "time_zone" : "UTC",
            "boost" : 1.0
           }
          }
         },
         {
          "range" : {
           "category_2.category_2_date" : {
            "from" : null,
            "to" : "2019-07-25T00:00:00.000Z",
            "include_lower" : false,
            "include_upper" : true,
            "time_zone" : "UTC",
            "boost" : 1.0
           }
          }
         }
        ],
        "adjust_pure_negative" : true,
        "boost" : 1.0
       }
      }
     ],
     "adjust_pure_negative" : true,
     "minimum_should_match" : "1",
     "boost" : 1.0
    }
   }
  ],
  "adjust_pure_negative" : true,
  "boost" : 1.0
 }
}

We identified the issue in the way the query was being built.

After using the from() and to() methods in RangeQueryBuilder when both limits were available and using gte() or lte() only when one of the limits were available and not explicitly setting includeUpper() and includeLower(), the query returned the correct results.

If the documentation were a bit more explanatory about the application of each of the above methods, it would have been helpful and saved time in narrowing down the cause of the problem.

Please consider this thread as resolved.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.