Nested query with should clauses and filter

I am using nested query with should clauses in elasticsearch 2.3.2

GET shop/customer/_search
{
   "from": 0,
   "size": 25,
   "query": {
      "bool": {
         "must": [
            {
               "nested": {
                  "query": {
                     "bool": {
                        "should": [
                           {
                              "multi_match": {
                                 "query": "london",
                                 "fields": [
                                    "address.city",
                                    "address.street"
                                 ]
                              }
                           },
                           {
                              "term": {
                                 "address.postcode": "123"
                              }
                           }
                        ]                        
                     }
                  },
                  "path": "address"
               }
            }
         ]
      }
   }
}

It works fine but when I add a filter clause to the bool query - I expect this to return only a subset of previous results. However it returns more results - all documents with given address.type, ignoring the should clauses:

GET shop/customer/_search
{
   "from": 0,
   "size": 25,
   "query": {
      "bool": {
         "must": [
            {
               "nested": {
                  "query": {
                     "bool": {
                        "should": [
                           {
                              "multi_match": {
                                 "query": "london",
                                 "fields": [
                                    "address.city",
                                    "address.street"
                                 ]
                              }
                           },
                           {
                              "term": {
                                 "address.postcode": "123"
                              }
                           }
                        ],
                        "filter": [
                           {
                              "terms": {
                                 "address.type": [
                                    "4b4372b7"
                                 ]
                              }
                           }
                        ]
                     }
                  },
                  "path": "address"
               }
            }
         ]
      }
   }
}

I checked the same with not nested query and it works as I'd expected. The should queries return only documents that contains address and adding filter returns subset of this results matching filter.

Yeah, this is a subtle issue due to how the new "filter" clause interacts with the bool.

So when you have all "should" clauses, one of them must match (otherwise everything in the index would match, as they would all be optional). That's the situation your first query is in: all "should" clauses, so one of them must match.

If you were to add a real "must" clause, the should's revert back to being entirely optional. Since the "must" must match a document, all the optional clauses just exist for extra scoring purposes (they boost score if they match).

Now, the tricky bit: this also applies to the new "filter" clause. If you add a "filter", the should's revert back to being entirely optional. So in your query's case, the only requirement is the "address.type", while the other clauses just exist to boost scoring. That's why you're seeing more matching hits.

The fix depends on what you want. Do you want the filter + one of the should clauses to match? If that's the case, you could do:

{
  "from":0,
  "size":25,
  "query":{
    "bool":{
      "must":[
        {
          "nested":{
            "query":{
              "bool":{
                "must":[
                  {
                    "bool":{
                      "should":[
                        {
                          "multi_match":{
                            "query":"london",
                            "fields":[
                              "address.city",
                              "address.street"
                            ]
                          }
                        },
                        {
                          "term":{
                            "address.postcode":"123"
                          }
                        }
                      ]
                    }
                  }
                ],
                "filter":[
                  {
                    "terms":{
                      "address.type":[
                        "4b4372b7"
                      ]
                    }
                  }
                ]
              }
            },
            "path":"address"
          }
        }
      ]
    }
  }
}

This put's the two should clauses inside their own bool, which is placed inside the must clause of the original bool. So this reads as: "The filter must match AND one of the two should clauses must match"

1 Like

Thanks, is the minimum_should_match=1 also correct in this case?

Hah, oops. Yes, I think that would be the much simpler and more elegant solution :slight_smile:

I'm about 95% sure that'll work, although give it a test to verify. Good call!