How to do "where field1 exists or field2 does not exist"


(Yehosef) #1

I have a case where I can have bad data I am looking for to correct. The data can either be missing a field called "time" or it can have a field called "bad_data". I'm having trouble building the query that would give me these results without using the missing filter, which is no longer there.

I assume I should be using a bool-should at the top level because it's an OR. But from there, I'm stuck.
With the first clause, the query looks like:

GET hn_items/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "exists": {
            "field": "bad_data"
          }
        }
      ]
    }
  }
}

which is fine and works. But I'm not sure how to add the "missing" clause - it now needs to be a "must_not > exists"

GET hn_items/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "exists": {
            "field": "bad_data"
          }
        },
        {
          "must_not": {
            "exists": {
              "field": "time"
            }
          }
        }
      ]
    }
  }
}

But this fails with a no [query] registered for [must_not] error. I then tried the following

GET hn_items/_search
{
  "query": {
    "bool": {
      "should": {
        "exists": {
          "field": "bad_data"
        },
        "must_not": {
          "exists": {
            "field": "time"
          }
        }
      }
    }
  }
}

but then I got the good 'ol [exists] malformed query, expected [END_OBJECT] but found [FIELD_NAME]

What am I doing wrong - how do I do this without the "missing" query?


(Alexander Reelsen) #2

the must not query needs to be wrapped inside another bool query to make this work, as there is no standalone must_not query.

Hope this helps.


(Yehosef) #3

Sorry for the delay in responding - thanks for the idea.

So it seems that this query works:

GET hn_items/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "exists": {
            "field": "bad_data"
          }
        },
        {
          "bool": {
            "must_not": [
              {
                "exists": {
                  "field": "time"
                }
              }
            ]
          }
        }
      ]
    }
  }
}

I'm a big believer in figuring as much out from the docs as possible. Where can I have found out this info from the docs.

Also - just a bit of feedback - I realized that "missing" was deprecated because you can accomplish it using a "must_not > exists", but now I see that it's really a "bool > must_not > exists" which is much more cumbersome (2 steps was enough - 3 is too much, IMO).

Even if internally it makes more sense to only have an exists, this is the kind of thing that would be nice to have "syntactic sugar" to replace "missing" with "bool > must_not > exists".


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.