Hey,
I do know that ElasticSearch uses internally version control, for optimistic concurrency control. It guarantees that only one update is processed at once to a document.
However, what happens when processing multiple updates to the same document in one bulk? Or even worse - multiple updates to the same document from different bulks? In our system, we use partial updates. In case of adding nested documents or incremental update some property - it's okay. I don't care which nested document is added before the other, and incremental value update doesn't care either. But, in case of an update which overrides some properties the order DOES matter, because the last write wins.
In this illustration, I have 3 different groups (gray, green, orange) in the documents, each partially updates. While concurrent add and incremental updates are okay (no conflicts might occur), concurrent override/replace requests are dangerous.
A solution for that is using some kind of time checking, like mentioned here. Meaning, making the update only if it is the newer:
POST /Shows/Show/332/_update
{
"scripted_upsert": true,
"script" : {
"inline": "if (ctx._source.last_update_time>= params.timestamp) { ctx.op = 'none' } else { ctx._source = params.document }",
"lang": "painless",
"params": {
"document": {
"timestamp": 112255
}
}
},
"upsert": {
"timestamp": 33333
}
}
Another solution, might be making sure only one bulk is handling the gray group updates. Hopefully, somehow updates in the same group are done according to their order. Didn't find any related information about it in ElasticSearch documentation. I did see mentioning this in this discussion.
What do you think? Is It an appropriate solution?