Hello,
(Note when I say parent/child below, I'm talking about objects not actual parent/child relationships)
I'm attempting to perform an update using _update_by_query
, however, my documents have several multi-tiered objects where the parent isn't always going to exist. When it tries to update documents with parent missing, it throws a null_pointer_exception. Note: Some of the documents have the parent with child fields and some don't. Is there a way to do this to ensure every matching document is updated with the new fields, but no other child fields (like 'name' below) are deleted.
I ask because all my updates by doc upserts work fine...feels like there should be a way I'm missing to essentially perform an upsert on query_by_update
(i.e. create parent object when creating children if it doesn't exist). [Edit: I realize this isn't an upsert so much as a lazy creation of the object's parent on update].
Below is a minimal example and error and my hack around it which overrides fields I want to stay. Thank you for any help you can provide!
Patrick
DELETE objectexample
PUT objectexample
PUT objectexample/_mapping/_doc
{
"properties": {
"top_field": { "type": "keyword" },
"user": {
"properties": {
"email": { "type": "keyword" },
"name": { "type": "keyword" }
}
}
}
}
POST objectexample/_doc/1/_update
{
"doc": {
"top_field": "blah"
},
"doc_as_upsert": true
}
POST objectexample/_doc/2/_update
{
"doc": {
"top_field": "blah",
"user": {
"name": "Bob Jones"
}
},
"doc_as_upsert": true
}
# This doesn't work, throws error seen below
POST objectexample/_update_by_query
{
"query": {
"term": {
"top_field": "blah"
}
},
"script": {
"source": "ctx._source.user.email = params.email",
"params": {
"email": "abc@xyz.com"
}
}
}
# No email shown in document
GET objectexample/_doc/1
# But this works, kinda as it deletes already existing 'user' fields which I don't want.
POST objectexample/_update_by_query
{
"query": {
"term": {
"top_field": "blah"
}
},
"script": {
"source": "ctx._source.user = params.user",
"params": {
"user": { "email": "abc@xyz.com"}
}
}
}
ERROR from update_by_query
:
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"ctx._source.user.email = params.email",
" ^---- HERE"
],
"script": "ctx._source.user.email = params.email",
"lang": "painless"
}
],
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"ctx._source.user.email = params.email",
" ^---- HERE"
],
"script": "ctx._source.user.email = params.email",
"lang": "painless",
"caused_by": {
"type": "null_pointer_exception",
"reason": null
}
},
"status": 500
}
This is how I'd like the two documents to look like:
# _id: 1
{
"top_field" : "blah",
"user" : {
"name" : "Bob Jones",
"email" : "abc@xyz.com"
}
}
# _id: 2
{
"top_field" : "blah",
"user" : {
"email" : "abc@xyz.com"
}
}