Delete by Query and Refresh Interval

Hi, I have a situation where I take the following steps

  • retrieve lock
  • delete by query
  • refresh
  • insert documents
  • release lock

These steps could happen in quick succession. The delete by query is supposed to find all the documents previously inserted and delete them all. For the most part, this has been working totally fine. However, I just discovered an incident where after these steps are complete, there are more documents than expected. I haven't been able to replicate this, but came up with the following hypothesis

Consider the following example with refresh interval set to 1s:

Iteration 1:

  • retrieve lock
  • delete by query (nothing to delete)
  • refresh
  • insert documents with IDs 1, 2, 3, all with a property X with value of Y
  • release lock

Immediately after (less than 1s), Iteration 2 occurs:

  • retrieve lock
  • delete by query for all documents with property X with value of Y
  • refresh
  • insert documents with IDs 2, 3, 4, all with a property X with value of Y
  • release lock

The final state I expected is to have only documents with IDs 2, 3, 4, but I ended up with documents with IDs 1, 2, 3, 4

Of course there may be other issues with my code, but my suspicion is that since there was no refresh performed between Iteration 1 and Iteration 2, and no refresh happened due to the 1s refresh interval not being reached, the delete by query failed to find and delete the documents with IDs 1, 2 and 3. Then when the insert phase of Iteration 2 happens, IDs 1, 2 and 3 already exist, so IDs 2 and 3 are updated and ID 4 is inserted.

Is this a possible explanation for my unexpected state?

Something that concerns me is that this hypothesis seems to contradict the following post: Elasticsearch delete_by_query 409 version conflict where a 409 error is expected instead. I did not receive any 409 errors in the example provided. Also, in the post, it's mentioned that the 409 error is due to the lack of refresh between the insert and delete by query. But if there's no refresh, that means the documents cannot be searched, so how can the delete by query find the document to delete in the first place?

Any insight would be greatly appreciated. Thanks!

For more info, the insert step is completed using Bulk Processor

Delete by query only works for indexed documents (because it relies on the index to find the documents) - so you need to make sure the index is up to date before using the next delete by query step.

Why are you refreshing before inserting the documents? It seems like by moving refresh after insertion your problem could be solved.

Thanks for the response! That makes sense, I agree that moving the refresh after insertion should resolve the problem.

My only open question is how come the 409 error occurs in the discussion here: Elasticsearch delete_by_query 409 version conflict

My expectation is that delete by query would just find no documents to delete since a refresh has not occurred since the documents were inserted. How can a 409 error occur if the delete by query only finds indexed documents? (I'm assuming by "indexed" you mean that a refresh has occurred after the insertion) It seems like from the discussion, the 409 error happens because somehow the delete by query finds an older version of the document that was just inserted. I'm probably missing something, but the document doesn't exist before the step of the document being indexed so how can there be a version conflict?

The only explanation I can think of is that the original poster meant that an update document request is sent instead of a create document request. Then delete by query would find the old version of the document and attempt to delete that and a 409 error will be returned

Ah, I see - it can lead to a problem if the "delete by query" and document insertion happens in parallel and overlaps partially. If the timing is unlucky it could lead to some the the newly inserted documents to be deleted again right away (that's what the 409 is telling you).

To be really save, you need to make sure the index is in a consistent state after both inserting and deleting. For insertion you can make sure by adding refresh=wait_for (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-refresh.html) to the request - this will make sure the request only completes after the index is updated.

Thanks Joe!

If my pattern is always

  1. Delete by query
  2. Bulk insert
  3. Refresh
  4. Repeat

Then would adding a Refresh between steps 1 and 2 help in any way?

Delete by query add tombstone records so I would expect a refresh after this phase to help.