I have a queue feeding a bunch of workers that are feeding into ES
with bulk index requests. In many cases, we create a document (which
goes into the queue) and then immediately update it.
Before the multiple workers, this was not a problem. but now we have a
race condition where the updated version of the document is actually
overwritten by the earlier version of the document.
For example, if the first version of the doc is: {"a":0} and the
second is {"a":1}, sometimes the second version is inserted first,
which means that the latest version of the doc is out of date.
is there any way to set priority on fields?
I'm looking into rearchitecting my workers, as well.
I have a queue feeding a bunch of workers that are feeding into ES
with bulk index requests. In many cases, we create a document (which
goes into the queue) and then immediately update it.
Before the multiple workers, this was not a problem. but now we have a
race condition where the updated version of the document is actually
overwritten by the earlier version of the document.
For example, if the first version of the doc is: {"a":0} and the
second is {"a":1}, sometimes the second version is inserted first,
which means that the latest version of the doc is out of date.
is there any way to set priority on fields?
I'm looking into rearchitecting my workers, as well.
Do you use the VersionType.EXTERNAL at all? Since you have a field of your
own data that is the version, you could set the version Property to be your
domain specific version instead of relying on ES's inbuilt versioning then
when the 1st version tries to overwrite the 2nd, you'll get a rejection
(VersionConflictException) which you can just swallow (though the exception
handling makes it a bit messy, since you only want to swallow this one
type, not other exceptions).
I have a queue feeding a bunch of workers that are feeding into ES
with bulk index requests. In many cases, we create a document (which
goes into the queue) and then immediately update it.
Before the multiple workers, this was not a problem. but now we have a
race condition where the updated version of the document is actually
overwritten by the earlier version of the document.
For example, if the first version of the doc is: {"a":0} and the
second is {"a":1}, sometimes the second version is inserted first,
which means that the latest version of the doc is out of date.
is there any way to set priority on fields?
I'm looking into rearchitecting my workers, as well.
You can use versioning to solve this, yes. Or, if its a case of knowing
that you expect to always create a doc in one case, and update a doc in the
second, you can set the create flag on the index operation, which will fail
if the document already exists.
I have a queue feeding a bunch of workers that are feeding into ES
with bulk index requests. In many cases, we create a document (which
goes into the queue) and then immediately update it.
Before the multiple workers, this was not a problem. but now we have a
race condition where the updated version of the document is actually
overwritten by the earlier version of the document.
For example, if the first version of the doc is: {"a":0} and the
second is {"a":1}, sometimes the second version is inserted first,
which means that the latest version of the doc is out of date.
is there any way to set priority on fields?
I'm looking into rearchitecting my workers, as well.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.