Nested Field exists check not working as expected

GET index1/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "nested": {
            "path": "publish_details",
            "query": {
              "bool": {
                "must_not": [
                  {
                    "exists": {
                      "field": "publish_details.environment"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

Query2

GET index1/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "nested": {
            "path": "publish_details",
            "query": {
              "bool": {
                "must": [
                  {
                    "exists": {
                      "field": "publish_details.environment"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

Both gives the same result. I was expecting must exists and must_not exists to give mutually exclusive results but it is not.
Please let me know how both gives the same results? and clarify do i need to specify the field path in exists query? ie., publish_details.environment vs environement usage

Each nested sub-document is stored as a separate document behind the scenes and evaluated individually. It is therefore possible for a document to have some sub-documents that match and others that do not match the clause. All sub-documents do not need to match, so a document could match both examples.

Take this simple sample document as an example:

{
  "publish_details": [
    {
      "clause": "1",
      "environment": "A"
    },
    {
      "clause": "2"
    }
  ]
}

The first nested document matches the exists clause while the second does not. When you are using a nested sub clause I believe you are looking for any match so I suspect this document should match both your queries. If you however removed one of the clauses it should only match one.

In My case both the conditions returns same set of documents, I used must in one query must_not in another query.

Yes, I know. The nested clause is evaluated against all nested documents. In the example I provided above one will match and one will not match for both of your queries. As only one need to match you get the same result for both queries.

If you index the following documents into a test index you should be able to see what I mean:

{
  "publish_details": [
    {
      "clause": "1",
      "environment": "A"
    },
    {
      "clause": "2"
    }
  ]
}

{
  "publish_details": [
    {
      "clause": "1",
      "environment": "A"
    }
  ]
}

{
  "publish_details": [
    {
      "clause": "2"
    }
  ]
}

When you run your queries I would expect you to get 2 results for each query.

I have the same problem. In my sample there are documents that have a completely empty/null nested field and the documents still are not found with must_not:[nested ... exists]. The reverse works, though.

Hello David,
did you find any solution?

Sort of. I added a non-nested field on the parent document that is a count of the number of nested records. It's a hack but seems to be the only workaround for this issue.

using aggregation API? Performance impact would be there right?

No, it's a denormalized field that gets stored whenever a document is created or updated.

1 Like