Find documents which don't have some field in the doc

I'm trying to differentiate between documents that have some field but the value is not set, and the documents that don't have value. I was thinking about term query and script fields but both don't seem to let me achieve my goal. Could you advice?

My queries:

See in the output if title is missing:

GET /employess_with_missing/_search
{
  "_source": {
    "includes": ["*"]
  },
  "script_fields": {
    "is_title_missing": {
      "script": {
        "lang": "painless", 
        "source" : "params._source.containsKey('title')"
      }
    }
  }
}

sorta works

"hits" : [
      {
        "_index" : "employess_with_missing",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "name" : "Bob Smith"
        },
        "fields" : {
          "is_title_missing" : [
            false
          ]
        }
      },
      {
        "_index" : "employess_with_missing",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 1.0,
        "_source" : {
          "name" : "John Smith",
          "title" : null,
          "age" : null
        },
        "fields" : {
          "is_title_missing" : [
            true
          ]
        }
      },
      {
        "_index" : "employess_with_missing",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "name" : "Susan Smith",
          "title" : "Dev Mgr",
          "age" : 33
        },
        "fields" : {
          "is_title_missing" : [
            true
          ]
        }
      },

but if I try to do same check in the query

GET /employess_with_missing/_search
{
  "_source": {
    "includes": ["*"]
  },
  "script_fields": {
    "is_title_missing": {
      "script": {
        "lang": "painless", 
        "source" : "params._source.containsKey('title')"
      }
    }
  },
    "query": {
    "bool": {
      "filter": {
        "script": {
          "script": {
            "lang": "painless",
            "source": "params._source.containsKey('title')"
          }
        }
      }
    }
  }
}

I get null pointer exception

            "params._source.containsKey('title')",
            "              ^---- HERE"
          ],
          "script": "params._source.containsKey('title')",
          "lang": "painless",
          "caused_by": {
            "type": "null_pointer_exception",
            "reason": null


If I try to replace params._source with doc on the query filter, it seems to be just ignored

My test data

PUT /employess_with_missing/_doc/3?pretty
{
"name": "Bob Smith"
}

PUT /employess_with_missing/_doc/8?pretty
{
"name": "John Smith", "title": null, "age": null
}

PUT /employess_with_missing/_doc/4?pretty
{
"name": "Susan Smith", "title": "Dev Mgr", "age": 33
}

PUT /employess_with_missing/_doc/6?pretty
{
"name": "Jane Smith", "title": "Software Eng 2", "age": 25
}
1 Like

try to use the exists query in combination with a must_not part of a bool query

2 Likes

Thanks for response.

The query with must_not exists doesn't do what I want to achieve, because it returns both documents:

  1. where title is null
  2. where title is not present in the doc

While I want to be able to find documents there title is not present in the doc only.

GET /employess_with_missing/_search
{
  "_source": {
    "includes": ["name", "title"]
    }
    , 
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "title"
        }
      }
    }
  }
}

    "hits" : [
      {
        "_index" : "employess_with_missing",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.0,
        "_source" : {
          "name" : "Bob Smith"
        }
      },
      {
        "_index" : "employess_with_missing",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 0.0,
        "_source" : {
          "name" : "John Smith",
          "title" : null
        }
      }
    ]
1 Like

by default null is treated the same way than a field not being existent, as nothing is stored in the inverted index. If you want to change this behaviour take a look at the null_value mapping parameter

--Alex

Thanks Alexander.

So to clarify my understanding:

Because there's nothing stored in the inverted index, nothing of the existing (exists, script) can differentiate non-existing field from the field with null value in common case, for query/filter contexts?

yes, that is correct.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.