Cluster is running version 6.8 and we are doing a mix of search/create/update using the NodeJS
Operations can access the same document in quick succession/concurrently since its based off events coming from kafka
All actions are sent with
refresh:true since we wish to be in sync as much as possible
and update requests are sent with
retry_on_conflict with some high number (5)
The issue -
No matter how high we set the
retry_on_conflict number we still get version mismatch exceptions.
What i don't understand is where the seqNo is coming from - when looking at the document we get from searching and use to update there is no version/seqNo/primaryTerm in it
Since we cannot reduce the concurrency of event handling per document the current idea is to re-fetch the document and then redo the update logic, but i don't see how it will change anything if the seqNo/primaryTerm/version is not there anyway
What is the correct way to handle these kinds of use cases? on the surface since we force a refresh it should work as close to a synchronous database as possible, additionally with the
retry_on_conflict parameter I expected that it would solve the issue completely
First 6.8 is Ancient, you should really upgrade as a matter of urgency.
You can always get the
_seq_no etc with a
Or as in the docs here for search
Note: The Search API can return the
_primary_term for each search hit by setting
so if you do not set the correct values that are returned from the previous index operation you will always get a mismatch
Optimistic concurrency control
Index operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the
if_primary_term parameters. If a mismatch is detected, the operation will result in a
VersionConflictException and a status code of 409. See Optimistic concurrency control for more details.
In newer versions there are some additional options such as
wait_for that can help with these things
1.True, 6.8 is indeed ancient but I am not aware of any security/breaking issues that are fixed between the latest 6.8 and 7.17. It is planned anyway
2.In the case that I don't fetch seqNo/primaryTerm, then what is the function of
retry_on_conflict? it is server side logic as far as I can tell but I see no way for elasticsearch to know the "correct"
primaryTerm from my request
wait_for exists in 6.8 as well and from the docs it looks like a "weaker" version of
true. Is conflict resolution different when using
For example we have 3 update requests for the same document - using
true seems to be like it lead to the same outcome: a conflict will occur and won't be solved by ES
Ahh I did not see wait_for in 6.8... thanks...
What API and how are you calling that API that generates the conflict.
Also you can get the
_seq_no with any GET by
_id call if you wanted to build your own logic and resubmit.
Per the docs Update is a 2 phase operations GET then Index so this would indicate why your are seeing the conflict... So the GET gets the _seq_no and primary term and by the time it tries to index the doc they have been changed, that is how I read it.
retry_on_conflict In between the get and indexing phases of the update, it is possible that another process might have already updated the same document. By default, the update will fail with a version conflict exception. The
retry_on_conflict parameter controls how many times to retry the update before finally throwing an exception.
Perhaps you are running into something else but that would seem to be a pretty good choice of the explanation...
Curious what rate you are calling refresh...
Perhaps a colleague of mine might have a comment @DavidTurner any insight?
Doesn't refresh rate of 100s for update mean any subsequent update of the same doc within 100s will result in version conflict. How often do you write? I assume it's faster than once per 100s.
Sorry its not 100 seconds, the refresh rate is the default (which should be 1 second? 30 seconds? unsure)
i meant there are 100 refreshes per second for updates and 500 per second for index
I have nothing to add, I think you covered everything. I don't remember what was or wasn't available in 6.8, it's too old, but what you say sounds reasonable for all versions that aren't past EOL.
I also suspect those high refresh rates might be contributing to the problem...
I could try to use
wait_for instead of
true at this point i guess the high number of retries would take as long as waiting for the actual refresh
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.