Hi,
Refresh mechanism makes confused about if updating succeed once I get response from ES.
Or is updating done in real-time and synchronised with user update request sending?.
As many say, updating a document execute in 4 steps normally:
query the old version document
construct new document
index the new document
mark the old document as deleted
Are these 4 steps exected sequentially one by one?
Are updates to same document executed sequentially one by one or concurrently?
Common updating cases are as belows:
partial update by _id
POST index_123/_update/doc_id_123
{
"script":{
"source":"if(ctx._source.state == 0){ctx._source.state=1}"
}
}
In this example, can ES read the newest version of docment doc_id_123 when executing this update even if there is no refresh operation before last update(state: -1 -> 0)?
In my test cases, updates always succeed, but I don't find any evidence that can proving this type updating is real-time.
replace the existing docment
POST index_123/_doc/doc_id_123
{
"doc":{
"state":1
}
}
Is done in real-time?
In my test cases, updates always succeed, but I don't find any evidence that can proving this type updating is real-time.
update_by_query
POST index_123/_update/doc_id_123
{
"query":{"term":{"state":{"value":0}}},
"script":{
"source":"{ctx._source.state=2}"
}
}
In my test cases I find, sometimes updates will get lost, maybe query string does not retrieve any documents with state=0 because of the refresh have not executed?
These are all steps within an update operation, so happens sequentially. You do not get a successful response until all has completed.
Once the update has completed the new document sits in the transaction log. That means you will get it if you request the document by ID but it will obly be made available for searches once a refresh occurs.
If multiple operations are done within a bulk request those are as far as I know sequential for a specific document. If there are separate requests they will be concurrent, which is why you can see version errors when you update with high frequency.
Yes. The document is retrieved by ID so it does not need to be available to search through a refresh.
Updates or creations are done in real time.
Once you have the answer from Elasticsearch after call 1 or call 2, the index is now updated.
BUT, if you GET by ID the document, you will see the change immediately. If you search (API _search), you might not be able to see the new version of the document until the refresh happens. You can always force a refresh manually by calling _refresh API. But don't do that for every update in production. It's fine though to do that in the context of integration tests before checking that documents have been updated.
About the update by query, it does:
A scrolled search behind the scene with a "snapshot" of the list of the documents available when you started the request
For each document, it runs an Update (using a _bulk call). After each bulk call, some documents will look updated, but not the remaining ones.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.