Elasticsearch delete_by_query 409 version conflict

According to ES documentation document indexing/deletion happens as follows:

  1. Request received at one of the nodes.
  2. Request forwarded to the document's primary shard.
  3. The operation performed on the primary shard and parallel requests sent to replica nodes.
  4. Primary shard node waits for a response from replica nodes and then send the response to the node where the request was originally received.
  5. Send the response back to the client.

Now in my case, I am sending a create document request to ES at time t and then sending a request to delete the same document (using delete_by_query) at approximately t+800 milliseconds. These requests are sent via a messaging system (internal implementation of kafka) which ensures that the delete request will be sent to ES only after receiving 200 OK response for the indexing operation from ES.

According to ES documentation, delete_by_query throws a 409 version conflict only when the documents present in the delete query have been updated during the time delete_by_query was still executing.

In my case, it is always guaranteed that the delete_by_query request will be sent to ES only when a 200 OK response has been received for all the documents that have to be deleted. Hence there is no possibility of an update/create of a document that has to be deleted during delete_by_query operation.

Please let me know if I am missing something or this is an issue with ES.

4 Likes

Hi @Rahul_Kumar3,

I think the missing piece to make this safe is a refresh.

Delete by query basically does a search for the objects to delete and then deletes them with version conflict checking. Without a _refresh in between, the search done by _delete_by_query might return the old version of the document, leading to a version conflict when the delete is attempted.

To be certain that delete by query sees all operations done, refresh should be called, see: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html . Or you can use the refresh parameter on the previous indexing request, see: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-refresh.html

Notice that refreshing is not free. The last link above explains some of the trade-offs involved including the impact on indexing and search performance.

The default refresh interval is 1s, see: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#dynamic-index-settings

3 Likes

Hi @HenningAndersen,

So _delete_by_query basically searches for the documents to delete and then deletes them one by one. And a version conflict occurs if one or more of the documents gets update in between the time when the search was completed and the delete operation was started.

But as I said, I had received a successful created/updated response for all the documents that have to deleted, before sending the _delete_by_query request. And I am pretty sure that that none of the documents are getting updated during the time duration when _delete_by_query is running.

So I am guessing that a successful creation/updation does not imply that that the data is successfully persisted across the primary and replica shards (and is available immediately for search) but instead is written to some kind of translog and then persisted on required nodes once a refresh is done.

Assuming my above assumption to be correct, _delete_by_query will throw a version conflict when a refresh occurs just after the search operation (of _delete_by_query) completes and delete operation starts.

Please correct me if I am wrong.

Hi @Rahul_Kumar3,

your summary is close to correct :slight_smile:

a successful creation/updation does not imply that that the data is successfully persisted across the primary and replica shards

The translog really resides on the primary and replica shards. So before Elasticsearch sends back a successful response to an index request, it ensures that:

  1. The request is welformed, no version conflicts and can be indexed into lucene (ie. all fields are valid etc.).
  2. The request is persisted in the translog on the primary
  3. The request is persisted in the translog on all current/alive replicas.

By default, Elasticsearch will fsync the translog before responding. So data are safely persisted when Elasticsearch responds OK to a request. For more info on translog (and when it does fsync) see here:

_delete_by_query will throw a version conflict when a refresh occurs just after the search operation (of _delete_by_query) completes and delete operation starts

A refresh is not necessary to get the version conflict. The version check is always done against newest state, Elasticsearch keeps track of the last version for every ID separately to enforce the version conflict check safely.

Hi @HenningAndersen,

I am confused a bit here. You are saying that translog is fsynced before responding for a request by default. And according to this document, An Elasticsearch flush is the process of performing a Lucene commit and starting a new translog. . This would mean that each document is committed to Lucene before an OK response is sent to the application and hence making it immediately available for search.

So, in this scenario, _delete_by_query search operation would find the latest version of the document. And as I mentioned previously, no documents are being updated during the time when search operation (of _delete_by_query) finishes and delete operation starts. So ideally ES should not throw version conflict in this case.

I was under the impression that translog is fsynced when the refresh operation happens. This would have made sense for the version conflicts as search operation (of _delete_by_query) would have found an earlier version and then fsync operation occurred and now the newer version was made searchable which resulted in a version conflict during the delete operation.

Please let me know if I am missing something here.

I believe this is the sequence of events:

  1. Request received at one of the nodes.
  2. Request forwarded to the document's primary shard.
  3. The operation performed on the primary shard and parallel requests sent to replica nodes.
  4. The translog is fsynced on primary and replica shards which makes it persisted. New documents are at this point not searchable.
  5. Primary shard node waits for a response from replica nodes and then send the response to the node where the request was originally received.
  6. Send the response back to the client.
  7. The refresh interval triggers a refresh of each shard, which performs a Lucene commit generating a new segment. The new data is now searchable. This is not coordinated across primary and replica shards.

As described these are two separate steps.

1 Like

This sequence of events does make sense.

But according to this document, synced flush (fsync) is a special kind of flush which performs a normal flush, then adds a generated unique marker (sync_id) to all shards.

And according to this document, an Elasticsearch flush is the process of performing a Lucene commit and starting a new translog.

From these two documents, I concluded that Lucene commit was happening during fsync operation and not during the refresh operation which created the confusion.

So the answer that I am looking for is whether Lucene commit happens during fsync or during refresh operation.

It happens during refresh. A synced flush is a special operation and should not be confused with the fsyncing of the translog that occurs per request.

In the flow I outlined above there would be no synced flush. Only if the API was explicitly called or the shard was idle for a period of time would this occur.

Thanks @Christian_Dahlqvist,

That does clears it for me.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.