Why is retry_on_conflict necessary?

We are battling to understand why version conflicts occur and why retry_on_conflict is a sensible strategy to resolving them.

Consider Document _id: 1 which has value foo: 1 and _version: 1. If several processes try to update this:

  • AppProcessX: foo: 2
  • AppProcessY: foo: 3

Then I expect that the first process writes foo: 2, _version: 2 and the next process writes foo: 3, _version: 3. The order which ElasticSearch gets the requests should matter. Since we do not provide a _version parameter from our ES client, we are saying. just update, don't worry about overwriting wrong values in the wrong order. We expect ES to do optimistic locking and not throw version errors.

If we assume that ES has multiple processes which process these updates in the "wrong" order - i.e. not in the order in time that the App Processes generated them, we expect ES to just write them in the wrong order foo: 2 will be the final and incorrect result.

  • ESProcess1: foo: 3
  • ESProcess2: foo: 2

We do not understand how ES gets mixed up - is ES reading _version: 1 shortly before it writes foo: 3 then it will write foo: 3, _version: 2 and then foo: 2, _version: 3. This is fine for our purposes.

How can ES decide that foo: 2 is in conflict with foo: 3. I can only imagine that ES is caching the _version field BEFORE the ES Queue is processed so it gets:

  • ESProcess1: foo: 2, _version: 1 => OK _version: 2
  • ESProcess1: foo: 3, _version: 1 => FAIL version_conflict_engine_exception

The failure occurs because ES "knows" it should be updating _version: 1 to _version: 2but it then sees that _version is already 2.

Is there any way, other than a retry to just ignore versioning?
How is a retry solving any issues related to sorting??? IMHO ES should just retry until it succeeds (if that's what the developer/client wants and doesn't care about ordering) or just fail if the developer wants ordering. If the developer wants ordering they should update by providing a _version field.

I can't see why retrying some golden number is a good strategy.

I do not know the answer so will leave it for someone else to comment on. There are however some additional things you need to consider.

  1. A lot of users execute updates through single requests or the bulk API, but it is also possible to perform updates through the update by query API, which runs in the background and can take some time to complete.

  2. In your example you are updating with a fixed value. Note that Elasticsearch also support scripted updates where you e.g. can increment fields and/or add values to arrays. When you do this type of operations the order and version does matter and ignoring version conflict would lead to data loss.