Update plus With Script in Bulk


(Loreto Parisi) #1

I'm using the bulk api for update actions with doc_as_upsert flag, in order to insert or update a document. This works ok. Now I would like to run a script as well for new or updated documents. Shall I use the script tag plus scripted_upsert or only the script tag?


(Ryan Ernst) #2

I believe that is correct, according to the docs.


(Loreto Parisi) #3

Thank you. Another question about this. Is then possibile to combine both?
I mean, how I'm using the update plus doc_as_upsert like . to update two documents in bulk I will do

{
                    index: myIndex,
                    type: '_doc',
                    body: [
                    { index:  {_id: docItemID1 } },
                    docItem1,
                    { index:  {_id: docItemID2 } },
                    docItem2
]
                }

With the script action I would like to update / append a new value to a tag field in the document item so something like:

{
   "scripted_upsert":true,
    "script" : {
        "source": "if ( !ctx._source.tags.contains(params.tag) ) ctx._source.tag.concat( params.tag)",
        "lang": "painless",
        "params" : {
            "tag" : myNewTag
        }
    },
    "upsert" : {
        "tag" : ["red","green"]
    }
}

Now, I want to use the scripted_upsert to take the best of the two worlds, so I imagine something like this - if it is correct (that is my question)

    {
        "script" : {
            "source": "if ( !ctx._source.tags.contains(params.tag) ) ctx._source.tag.concat( params.tag)",
            "lang": "painless",
            "params" : {
                "tag" : myNewTag
            }
        },
        "upsert" : docItem
    }

where docItem will contain a tag item to be updated. This tag items is a comma separated list of tags like red,green.
Is this approach correct? If so which is the right body when doing the first approach with the bulk api, i.e. when using the body as [] of actions having an update with the scripted_upsert flag?


(Ryan Ernst) #4

scripted_upsert means the script runs whether the document already exists or not. If you want to use docItem as the initial document for the script to act on, maybe try passing it in as a param to the script.


(Loreto Parisi) #5

Thanks, it is still not clear how to do this in practice. There is no example in the docs of this use case: I want to update an existing document docItem or create it from scratch, and at-the-same-time I want to use the script with scripted_upsert so that I can append to the tag list. So supposed I will pass docItem in the parameters, how can I handle both cases (docItem exists, docItem new and append to to docItem['tag'] field in the script?
Thank you.
I have also asked here if you can provide more details - https://stackoverflow.com/questions/53394321/update-index-with-script-in-elasticsearch-bulk-javascript-api


(Ryan Ernst) #6

I believe the upsert element is still valid when doing a scripted upsert, it is just merged in with the doc after the script is run. The scripted upsert example in the docs I posted before looks pretty close to what you want.

The script would contain the code to add your tags to ctx._source. The upsert would contain your initial document. The script will run, adding the tags, and then if the doc doesn't exist, it will be merged with the upsert element. If you have changes, then instead of passing upsert, pass the doc into your script via params. You would then need to do your own merging with a potentially already existing doc.