Scripted bulk upserting issue

Hi everyone.

A few days ago we noticed a strange behavior in our indexing pipeline. We are using painless scripts in combination with bulk API to perform 1:N relationship document merge-operations.
We noticed that when we are trying to bulk update a single document with the painless script - multiple identical inner objects are created as a result.

How to reproduce?:

Start a docker container:

docker container run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.8.1

Create a simple index:
curl -XPUT "http://localhost:9200/test-index" | json_pp

Create a painless script:

    curl -XPOST "http://localhost:9200/_scripts/inner-upsert" -H 'Content-Type: application/json' -d'
    {
      "script": {
        "lang": "painless",
        "code": "def record = params.data;\nif (ctx.op == \"create\") {\n    ctx._source.Ver = 1;\n}\n\ndef empty = ctx._source.inner == null;\nif (empty){\n    ctx._source.inner = [];\n    ctx._source.inner.add(record);\n} else {\n    if (ctx._source.inner.size() > 100){\n        throw new IllegalStateException(\"There is too much inner documents\")\n    }\n\n    ctx._source.inner.add(record);\n    if (ctx._source._InnerUpdates == null){\n        ctx._source._InnerUpdates = 0\n    }\n    ctx._source._InnerUpdates = ctx._source._InnerUpdates + 1\n}\n\nctx;"
      }
    }' | json_pp

When using a single document bulk upserting with the painless script - we are getting multiple inner documents as a result:

    curl -XPOST "http://localhost:9200/_bulk?_source=true" -H 'Content-Type: application/json' -d'
    {"update":{"_index":"test-index","_type":"_doc","_id":"1-1","retry_on_conflict":1}}
    {"script":{"id":"inner-upsert","params":{"data":{"param1":1,"param2":2}}},"upsert":{},"scripted_upsert":true}
    ' | json_pp

Bulk upsert result (inner array with three identical documents):
curl -XGET "http://localhost:9200/test-index/_search" | json_pp

    "_source" : {
        "Ver" : 1,
        "_InnerUpdates" : 2,
        "inner" : [
            {
            "param1" : 1,
            "param2" : 2
            },
            {
            "param1" : 1,
            "param2" : 2
            },
            {
            "param2" : 2,
            "param1" : 1
            }
        ]
    }

In contrary to the bulk upserting with the painless script, when we are upserting a single document with a painless script - we are getting a single inner document as a result (which is expected):

    curl -XDELETE "http://localhost:9200/test-index" | json_pp
    curl -XPUT "http://localhost:9200/test-index" | json_pp
    curl -XPOST "http://localhost:9200/test-index/_doc/1-1/_update" -H 'Content-Type: application/json' -d'
    {
      "script": {
        "id": "inner-upsert",
        "params": {
          "data": {
            "param1": 1,
            "param2": 2
          }
        }
      },
      "upsert": {},
      "scripted_upsert": true
    }' | json_pp

Update result (inner array with a single document):
curl -XGET "http://localhost:9200/test-index/_search" | json_pp

    "_source" : {
        "Ver" : 1,
        "inner" : [
            {
                "param1" : 1,
                "param2" : 2
            }
        ]
    }

We tried the same procedure for the following elasticsearch versions:

  • v5.6.4 - there is no issue
  • v6.8.1 - the issue is present
  • v7.6.2 - there is no issue

Can anyone explain why is this happening (is this some kind of bug or expected behavior)

Thank you in advance.

Seems that this issue has been resolved in version 7.5.1:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.