Suppose I have a cluster with two elasticsearch nodes. What will happen if two INDEX requests of a document with same document ID (but slightly different content) will take place simultaneously, each will arrive to a different node?
In my case, I use upsert to insert/update documents. Each document consists of two parts (ORIG part and TERM part). Both parts share the same document ID. Each parts is sent in a separate request to elasticsearch. In happy scenario, one request (with one part) will insert the document and the other request (with the other part) will update the document.
I wonder what will happen if both upsert requests will arrive simultaneously to two different elasticsearch nodes?
The first thing that happens is that Elasticsearch shifts execution of the upserts onto the primary copy of the shard that is getting the document. The second thing is that it does a local Engine.Get operation for the current contents. Third thing is that it'll do a local Engine.Index operation. This operation is synchronized on the document's ID, taking a lock related to the modulo of the has of the ID (hold for the duration of the operation in 2.x, in 5.0 it is held for just long enough to check out a specific lock). One of the two upserts will get the lock first and it'll be the first. The second one will either see the change or it won't. If it does it is uninteresting. If it doesn't see the change it'll issue the same Engine.Index operation that its buddy just did and it'll get a VersionConflictEngineException. If it has _retry_on_conflict set to something more than 0 then it'll restart the upsert from the Engine.Get operation. Then it'll see first one's change and it'll be boring again, just doing an update.
On ES 1.5.2 I have seen concurrent update (create) collide and result in version:2 document that only has the data from one of the operations. VersionConflictEngineException was not thrown. I saw this around 1% of the time when sending many concurrent requests.
On ES 1.5.2 I have seen concurrent update (create) collide and result in version:2 document that only has the data from one of the operations. VersionConflictEngineException was not thrown. I saw this around 1% of the time when sending many concurrent requests.
Thanks. I am running ES 5. My wish of thinking is that if there was such bug in 1.5.2 it was fixed along the way to 5.0.0.
That certainly shouldn't happen and if you can recreate it with a modern (2.x, 5.0) version of Elasticsearch with some kind of stand alone script then file an issue!
There are ways you can make this happen that are expected though. Like if you send two index requests simultaneously rather than two _create requests or two _update requests. version_type also has some "fun" uses, including the quite dengerous version_type=force which can get your primaries and replicas out of sync. Even if your reproduction turns out to be one of those expected cases it'll probably inform some change, if just a documentation change.
Just be ok with throwing out all your data and starting over up until we hit GA. While any one version of 5.0-alpha/5.0-beta has to be compatible with 2.x's on disk format, they don't have to be compatible with each other. That lets us move quickly while working on 5.0. I think we've finished with all the on disk changes, but stranger things have happened.
We also change the wire protocol in backwards incompatible ways in 5.0's alphas and betas so you'll need a full cluster restart to upgrade any of those. Once we hit GA all versions of 5.x will be wire compatible with each other so you should be able to do a rolling restart for the upgrades after that.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.