Handling Conflicts

Hey,

I do know that ElasticSearch uses internally version control, for optimistic concurrency control. It guarantees that only one update is processed at once to a document.

However, what happens when processing multiple updates to the same document in one bulk? Or even worse - multiple updates to the same document from different bulks? In our system, we use partial updates. In case of adding nested documents or incremental update some property - it's okay. I don't care which nested document is added before the other, and incremental value update doesn't care either. But, in case of an update which overrides some properties the order DOES matter, because the last write wins.

In this illustration, I have 3 different groups (gray, green, orange) in the documents, each partially updates. While concurrent add and incremental updates are okay (no conflicts might occur), concurrent override/replace requests are dangerous.

A solution for that is using some kind of time checking, like mentioned here. Meaning, making the update only if it is the newer:

POST /Shows/Show/332/_update
{
   "scripted_upsert": true,
   "script" : {
     "inline": "if (ctx._source.last_update_time>= params.timestamp) { ctx.op = 'none' } else { ctx._source = params.document }", 
     "lang": "painless",
     "params": {
        "document": {
          "timestamp": 112255

        }
     }
   },
   "upsert": {
      "timestamp": 33333
   }
}

Another solution, might be making sure only one bulk is handling the gray group updates. Hopefully, somehow updates in the same group are done according to their order. Didn't find any related information about it in ElasticSearch documentation. I did see mentioning this in this discussion.

What do you think? Is It an appropriate solution?

May be interesting to read:

And

Apart from concurrency and correctness, be aware that very frequent updates to documents can cause poor performance. If a document that have not yet been written to a segment is updated, this will cause a refresh to occur. You could therefore end up with lots of very small refreshes, which is inefficient.

Hey, thanks. Actually read all of that posts before publishing my post, thanks. All of them mention the problem of concurrency and the version control mechanism, but don't answer my question directly.

In the bulk post, I can see that the versions of the document grow according to the order of the requests. But I don't understand if it is guaranteed or it just for demonstration purposes.

Thanks for the response and concerns. Most of the times, we are talking about 5 to 15 updates, and the maximum number I have ever seen for version number was 30 at max for a single document. Some of the updates occur right after the other (seconds delay), and some of them might occur after hours or days.

The override is done automatically by machine, and might be the first 5 updates. But other partial updates are done by customers and usually no more than 3 customers work on the same document.

After considering those numbers, do you think it is more appropriate? Until now we haven't suffered from performance issues while indexing / updating our documents.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.