Hi everyone.
A few days ago we noticed a strange behavior in our indexing pipeline. We are using painless scripts in combination with bulk API to perform 1:N relationship document merge-operations.
We noticed that when we are trying to bulk update a single document with the painless script - multiple identical inner objects are created as a result.
How to reproduce?:
Start a docker container:
docker container run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.8.1
Create a simple index:
curl -XPUT "http://localhost:9200/test-index" | json_pp
Create a painless script:
curl -XPOST "http://localhost:9200/_scripts/inner-upsert" -H 'Content-Type: application/json' -d'
{
"script": {
"lang": "painless",
"code": "def record = params.data;\nif (ctx.op == \"create\") {\n ctx._source.Ver = 1;\n}\n\ndef empty = ctx._source.inner == null;\nif (empty){\n ctx._source.inner = [];\n ctx._source.inner.add(record);\n} else {\n if (ctx._source.inner.size() > 100){\n throw new IllegalStateException(\"There is too much inner documents\")\n }\n\n ctx._source.inner.add(record);\n if (ctx._source._InnerUpdates == null){\n ctx._source._InnerUpdates = 0\n }\n ctx._source._InnerUpdates = ctx._source._InnerUpdates + 1\n}\n\nctx;"
}
}' | json_pp
When using a single document bulk upserting with the painless script - we are getting multiple inner documents as a result:
curl -XPOST "http://localhost:9200/_bulk?_source=true" -H 'Content-Type: application/json' -d'
{"update":{"_index":"test-index","_type":"_doc","_id":"1-1","retry_on_conflict":1}}
{"script":{"id":"inner-upsert","params":{"data":{"param1":1,"param2":2}}},"upsert":{},"scripted_upsert":true}
' | json_pp
Bulk upsert result (inner array with three identical documents):
curl -XGET "http://localhost:9200/test-index/_search" | json_pp
"_source" : {
"Ver" : 1,
"_InnerUpdates" : 2,
"inner" : [
{
"param1" : 1,
"param2" : 2
},
{
"param1" : 1,
"param2" : 2
},
{
"param2" : 2,
"param1" : 1
}
]
}
In contrary to the bulk upserting with the painless script, when we are upserting a single document with a painless script - we are getting a single inner document as a result (which is expected):
curl -XDELETE "http://localhost:9200/test-index" | json_pp
curl -XPUT "http://localhost:9200/test-index" | json_pp
curl -XPOST "http://localhost:9200/test-index/_doc/1-1/_update" -H 'Content-Type: application/json' -d'
{
"script": {
"id": "inner-upsert",
"params": {
"data": {
"param1": 1,
"param2": 2
}
}
},
"upsert": {},
"scripted_upsert": true
}' | json_pp
Update result (inner array with a single document):
curl -XGET "http://localhost:9200/test-index/_search" | json_pp
"_source" : {
"Ver" : 1,
"inner" : [
{
"param1" : 1,
"param2" : 2
}
]
}
We tried the same procedure for the following elasticsearch versions:
- v5.6.4 - there is no issue
- v6.8.1 - the issue is present
- v7.6.2 - there is no issue
Can anyone explain why is this happening (is this some kind of bug or expected behavior)
Thank you in advance.