I would like to ask about bulk commands with external version. I am trying to use the Bulk API to send the set of document requests to the server and I am using the external versioning. While the index, update and delete works I would say normally (it checks whether it does have already this version in nodes and in such case the request is declined), the create request completely ignores the external versioning and instead just increases the version number by one. This seems little strange to me since it is not consistent with other commands.
Lets say I got the document where it was created with version 1 and it was deleted with in version 2.
Request would look something like this:
POST /_bulk?refresh=wait_for
{"create":{"_index":"index","_type":"type","_id":"1","_version":1,"_version_type":"external"}}
{"data":"test data"}
{"delete":{"_index":"index","_type":"type","_id":"1","_version":2,"_version_type":"external"}}
For this example also lets assume that I have lost the connection to the server and I do not know if the commands were processed so I would send them again since I would assume version control will take care of it on nodes.
I would expect that when I sent a create with version 1 and there is already a document with version 2 on the node that the create would be declined. But in fact it creates a document with version 3. The delete request that comes just after it fails since it has version 2 and there is already version 3 on the node.
In the end I have document on the node that should be deleted according to the external version.
Sure I can use just index command instead of create, but what if I want the request to fail if the document is already present in the node.
Seems like a bug to me. From my point of view create should behave in versioning the same way as other document commands. I have been looking in the guide, but I have not found any place that would say that for create the external versioning is not supported.
Ok I might have one information there incorrect, update also does not support external version, even when it seems strange. On the other hand it tells us so in the response. Create on the other hand just ignores it.
I would expect that when I sent a create with version 1 and there is already a document with version 2 on the node that the create would be declined. But in fact it creates a document with version 3. The delete request that comes just after it fails since it has version 2 and there is already version 3 on the node.
I'm not sure I understand this. Can you provide a step-by-step example? I've tried the following and it looks fine to me:
The issue with your example is that you have used internal versioning and you are not aware of that. The document was not existing before you created it so the internal versioning have given it the same version number 1. That is also the same number as you have filled as an external version but they are not related. If you would have set external version to 3 you would still get response that the document is version one. Second call fails not because of version conflict, but because create is checking if the file is not already created.
I would say change the bulk request to something like this
Seems dumb but simulates multiple conflicting requests sends to the server. It should create a document with version 1, delete it with version 2. Second call to create document with version 1 should fail due to version conflict, but it instead creates document with version 3 because create uses internal version system so it just increments the number instead of using the external. Second delete will fail because of version conflict. So expected result would be deleted document, but instead of it we have a document with version 3.
Oh, I see. Thank you for the explanation. Indeed there seems to be a difference between create and index (already exhibited with simple index operations, not limited to bulk):
I've tested with different ES versions and this distinction between create and index only happens on ES v5.0.0+. I'll check with the developer that made the changes on https://github.com/elastic/elasticsearch/pull/13955 and get back to you.
OK, we've discussed this internally: One of the goals of the PR I linked above was to drastically reduce the complexity of create vs index. This simplification included disallowing create requests with anything other than internal versioning. There is a bug in the validation logic, however, so that external versioning is simply turned into internal versioning instead. I will open a PR to fix this.
I have been looking in the guide, but I have not found any place that would say that for create the external versioning is not supported.
I will update the docs to properly reflect that.
Sure I can use just index command instead of create, but what if I want the request to fail if the document is already present in the node.
If you use external versioning, is that really something to care about? Can you elaborate on the use case there?
Well create is end not such problem in the end, as I can use index. Worse is that update API is not supporting external version not even when I make sure that I increase the version all the time with each update. So I can use noop detection with external versioning which is really a pity because it would help tremendously. I have started another thread about it detect noops with external version to see if there is a way around it. But as for this thread I mainly want to know how it exactly works if I was not missing something. Thank you for your answer.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.