Update by Query on Object not in source

Hello,

(Note when I say parent/child below, I'm talking about objects not actual parent/child relationships)

I'm attempting to perform an update using _update_by_query, however, my documents have several multi-tiered objects where the parent isn't always going to exist. When it tries to update documents with parent missing, it throws a null_pointer_exception. Note: Some of the documents have the parent with child fields and some don't. Is there a way to do this to ensure every matching document is updated with the new fields, but no other child fields (like 'name' below) are deleted.

I ask because all my updates by doc upserts work fine...feels like there should be a way I'm missing to essentially perform an upsert on query_by_update (i.e. create parent object when creating children if it doesn't exist). [Edit: I realize this isn't an upsert so much as a lazy creation of the object's parent on update].

Below is a minimal example and error and my hack around it which overrides fields I want to stay. Thank you for any help you can provide!
Patrick

DELETE objectexample
PUT objectexample
PUT objectexample/_mapping/_doc
{
  "properties": {
    "top_field": { "type": "keyword" },
    "user": {
      "properties": {
        "email": { "type": "keyword" },
        "name": { "type": "keyword" }
      }
    }
  }
}

POST objectexample/_doc/1/_update
{
  "doc": {
    "top_field": "blah"
  },
  "doc_as_upsert": true
}

POST objectexample/_doc/2/_update
{
  "doc": {
    "top_field": "blah",
    "user": {
      "name": "Bob Jones"
    }
  },
  "doc_as_upsert": true
}

# This doesn't work, throws error seen below
POST objectexample/_update_by_query
{
  "query": {
    "term": {
      "top_field": "blah"
    }
  },
  "script": {
    "source": "ctx._source.user.email = params.email",
    "params": {
      "email": "abc@xyz.com"
    }
  }
}

# No email shown in document
GET objectexample/_doc/1

# But this works, kinda as it deletes already existing 'user' fields which I don't want.
POST objectexample/_update_by_query
{
  "query": {
    "term": {
      "top_field": "blah"
    }
  },
  "script": {
    "source": "ctx._source.user = params.user",
    "params": {
      "user": { "email": "abc@xyz.com"}
    }
  }
}

ERROR from update_by_query:

{
  "error": {
    "root_cause": [
      {
        "type": "script_exception",
        "reason": "runtime error",
        "script_stack": [
          "ctx._source.user.email = params.email",
          "                ^---- HERE"
        ],
        "script": "ctx._source.user.email = params.email",
        "lang": "painless"
      }
    ],
    "type": "script_exception",
    "reason": "runtime error",
    "script_stack": [
      "ctx._source.user.email = params.email",
      "                ^---- HERE"
    ],
    "script": "ctx._source.user.email = params.email",
    "lang": "painless",
    "caused_by": {
      "type": "null_pointer_exception",
      "reason": null
    }
  },
  "status": 500
}

This is how I'd like the two documents to look like:

# _id: 1
{
    "top_field" : "blah",
    "user" : {
        "name" : "Bob Jones",
        "email" : "abc@xyz.com"
    }
}

# _id: 2
{
    "top_field" : "blah",
    "user" : {
        "email" : "abc@xyz.com"
    }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.