Optimistic Concurrency on new documents

I have a process that will perform the following steps in response to a message being received:

  • Attempt to get a document, include seq_no and primary_term if found
  • Create a new document, based on the results of some in-memory merge of the received message and (optionally) the retrieved document
  • Index the new document to the same ID, using the previous seq_no/primary_term or leaving them off if not found to make use of the optimistic locking that ES provides.

If a document already exists at a certain ID, and 2 messages are then received concurrently, the process works fine:

  • message 1 received, gets document from ES with seq_no/primary_term
  • message 2 received, gets same document with same seq_no/primary_term
  • either process finishes first, and is able to save the new document
  • whichever process is slower will fail due to VersionConflictEngineException - I can retry or otherwise handle this myself now

If the document does not exist though, and 2 messages are received concurrently, the process fails:

  • message 1 received, finds no document in ES so has no seq_no or primary_term
  • message 2 received, also finds no document in ES
  • either process finishes first, and saves new document, leaving off seq_no/primary_term
  • whichever process is slower will also be able to save since it also leaves off seq_no/primary term. but I would like this save to fail

It would be nice if you could provide (0,0) as a starting point or something similar, so you can represent the notion of "index only if document doesn't exist".

I have dug through the elasticsearch source a bit and this seems to be where it does the version check: https://github.com/elastic/elasticsearch/blob/57859413eaf1f59357eb6a9875ca0ae51a76bbb3/server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java#L992
Looks like if you provide any seq_no/primary_term at all, but the index does not contain the document yet, it will always fail - so there is not valid value (like 0,0) you can send for a nonexistent document. All you can do is leave it unassigned which leaves you back where I described above.

This turned out to be a good rubber duck debugging session - as I was typing this out, the "index only if document doesn't exist" clued me in to what was missing. So here is how I solved it in case it helps anyone else:

In the event the document wasn't found, don't use seq_no/primary_term. Instead use op_type = create, which does exactly what I was looking for (index only if it doesn't exist already). You can still catch the 409 Conflict response type with that.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.