Handling Conflicts

panda2004 · June 9, 2018, 8:48pm

Hey,

I do know that ElasticSearch uses internally version control, for optimistic concurrency control. It guarantees that only one update is processed at once to a document.

However, what happens when processing multiple updates to the same document in one bulk? Or even worse - multiple updates to the same document from different bulks? In our system, we use partial updates. In case of adding nested documents or incremental update some property - it's okay. I don't care which nested document is added before the other, and incremental value update doesn't care either. But, in case of an update which overrides some properties the order DOES matter, because the last write wins.

In this illustration, I have 3 different groups (gray, green, orange) in the documents, each partially updates. While concurrent add and incremental updates are okay (no conflicts might occur), concurrent override/replace requests are dangerous.

A solution for that is using some kind of time checking, like mentioned here. Meaning, making the update only if it is the newer:

POST /Shows/Show/332/_update
{
   "scripted_upsert": true,
   "script" : {
     "inline": "if (ctx._source.last_update_time>= params.timestamp) { ctx.op = 'none' } else { ctx._source = params.document }", 
     "lang": "painless",
     "params": {
        "document": {
          "timestamp": 112255

        }
     }
   },
   "upsert": {
      "timestamp": 33333
   }
}

Another solution, might be making sure only one bulk is handling the gray group updates. Hopefully, somehow updates in the same group are done according to their order. Didn't find any related information about it in ElasticSearch documentation. I did see mentioning this in this discussion.

What do you think? Is It an appropriate solution?

dadoonet · June 10, 2018, 3:10am

May be interesting to read:

And

Christian_Dahlqvist · June 10, 2018, 5:46am

Apart from concurrency and correctness, be aware that very frequent updates to documents can cause poor performance. If a document that have not yet been written to a segment is updated, this will cause a refresh to occur. You could therefore end up with lots of very small refreshes, which is inefficient.

panda2004 · June 11, 2018, 5:56am

Hey, thanks. Actually read all of that posts before publishing my post, thanks. All of them mention the problem of concurrency and the version control mechanism, but don't answer my question directly.

In the bulk post, I can see that the versions of the document grow according to the order of the requests. But I don't understand if it is guaranteed or it just for demonstration purposes.

panda2004 · June 11, 2018, 6:02am

Thanks for the response and concerns. Most of the times, we are talking about 5 to 15 updates, and the maximum number I have ever seen for version number was 30 at max for a single document. Some of the updates occur right after the other (seconds delay), and some of them might occur after hours or days.

The override is done automatically by machine, and might be the first 5 updates. But other partial updates are done by customers and usually no more than 3 customers work on the same document.

After considering those numbers, do you think it is more appropriate? Until now we haven't suffered from performance issues while indexing / updating our documents.

system · July 9, 2018, 6:02am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Doubt in conflicting version behavior Elasticsearch	6	521	March 4, 2019
Version Conflicts in Bulk Upsert Elasticsearch	3	4818	July 23, 2017
How to handle partial updates concurrency Elasticsearch	2	813	September 19, 2019
Handling updates from multiple sources Elasticsearch	6	2798	July 6, 2017
Update upsert and concurrency control Elasticsearch	5	1267	June 29, 2019

Handling Conflicts

Related topics