How do I create a boolean "OR" filter?

I'd like to filter for EITHER (OR) of two values, but a filter appears to require (AND) a document to contain EACH value. I'm using a bool query currently, and should appears to be the only way to do an OR expression. However, I do not want relevance scores and I want node query caching, which is supported by filter context (enabled by filter) not query context (enabled by should).

The only ways to get filter context are described here in the documentation:

Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in the bool query, the filter parameter in the constant_score query, or the filter aggregation. (Query and filter context | Elasticsearch Guide [8.11] | Elastic)

As you can see, the filter parameter is required.

You can use the should clause to create and OR filter

Here is a complete example

DELETE discuss-test

PUT discuss-test
{
  "mappings": {
    "properties": {
      "type": {
        "type": "keyword"
      },
      "gender": {
        "type": "keyword"
      },
      "name": {
        "type": "keyword"
      }
    }
  }
}


POST discuss-test/_doc
{
  "type" : "feline",
  "gender" : "male",
  "name" : "chuck"
}

POST discuss-test/_doc
{
  "type" : "feline",
  "gender" : "female",
  "name" : "mary"
}

POST discuss-test/_doc
{
  "type" : "canis",
  "gender" : "male",
  "name" : "spot"
}

POST discuss-test/_doc
{
  "type" : "canis",
  "gender" : "female",
  "name" : "mary"
}


Now the query


# This is the equivalent of an OR filter with should 
GET discuss-test/_search
{
  "query": {
    "bool": {
      "should": [ <! -- This mean OR, must means AND
        {
          "term": {
            "type": {
              "value": "canis"
            }
          }
        },
        {
                  "term": {
            "name": {
              "value": "mary"
            }
          }
        }
      ]
    }
  }
}

and the resulting OR.


{
  "took" : 21,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.3862942,
    "hits" : [
      {
        "_index" : "discuss-test",
        "_type" : "_doc",
        "_id" : "U6tJdXsBm1AJFmUV717P",
        "_score" : 1.3862942,
        "_source" : {
          "type" : "canis",
          "gender" : "female",
          "name" : "mary"
        }
      },
      {
        "_index" : "discuss-test",
        "_type" : "_doc",
        "_id" : "UatJdXsBm1AJFmUV716l",
        "_score" : 0.6931471,
        "_source" : {
          "type" : "feline",
          "gender" : "female",
          "name" : "mary"
        }
      },
      {
        "_index" : "discuss-test",
        "_type" : "_doc",
        "_id" : "UqtJdXsBm1AJFmUV7162",
        "_score" : 0.6931471,
        "_source" : {
          "type" : "canis",
          "gender" : "male",
          "name" : "spot"
        }
      }
    ]
  }
}

This introduces scoring however.

Yup sorry I read wrong, just saw you did not want scoring...

Took me a few minutes how about this...
Wrap the bool should inside the filter...

GET discuss-test/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "should": [
              {
                "term": {
                  "type": "canis"
                }
              },
              {
                "term": {
                  "name": "mary"
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Result is


{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "discuss-test",
        "_type" : "_doc",
        "_id" : "UatJdXsBm1AJFmUV716l",
        "_score" : 0.0,
        "_source" : {
          "type" : "feline",
          "gender" : "female",
          "name" : "mary"
        }
      },
      {
        "_index" : "discuss-test",
        "_type" : "_doc",
        "_id" : "UqtJdXsBm1AJFmUV7162",
        "_score" : 0.0,
        "_source" : {
          "type" : "canis",
          "gender" : "male",
          "name" : "spot"
        }
      },
      {
        "_index" : "discuss-test",
        "_type" : "_doc",
        "_id" : "U6tJdXsBm1AJFmUV717P",
        "_score" : 0.0,
        "_source" : {
          "type" : "canis",
          "gender" : "female",
          "name" : "mary"
        }
      }
    ]
  }
}
1 Like

Perfect. That removes the scoring. I just hope it's not calculating the scores underneath still and then just dropping them. Do you know this by any chance?

Run it in the Query Profiler .. looks like constant score to me.

No it does not look like is is actually scoring to me.

If you look at the details, there is no time on the actual score.

You can even try a constant_score query ... there is less time spent in the filter than the that

GET discuss-test/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": { "type": "canis" }
      },
      "boost": 1.2
    }
  }
}

Take a look...

2 Likes

Wow, that's great. Thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.