Assume that I do several updates/inserts/deletes to the elasticsearch in response to a user request in different parts of my code and I want to make sure they are visible to the user before I return a success status to the user. setting a refresh policy of WAIT_UNTIL on each of the changes may cause considerable delay in the response because they run sequentially. Another solution is to send a RefreshRequest once all the changes are done. But that may cause overhead to the system. A better solution may be to just wait until the next refresh and then send response to the user. However, I did not find a clean way to just wait until the next refresh. It may be a good idea to add an option to RefreshRequest to behave like WAIT_UNTIL and do not force a refresh.
?refresh=wait_for will wait for the changes to be made visible before replying to client. It don't force refresh immediately, rather, it waits for refresh to happen. Default refresh interval is 1s.
As I know, wait for can be used for the insert/update/delete requests. As I said, if I use it for all the changes it can cause 10 changes to e.g., take about 10 seconds. An alternative is to not use wait_for and refresh only once when all changes are performed. However, this can cause a refresh per user request. A better solution may be to have some mechanism for waiting until the next refresh (without doing any changes). A good option may be to add an option to RefreshRequest that only waits and does not start a real refresh.
What wrote @wangqinghuan is true. This is the option you want IMO.
So RefreshRequest has a wait_for option?
WAIT_UNTIL option. See
I KNOW that there is a WAIT_UNTIL option for changes.
I want it on a refresh request.
Oh. I see.
Why not on index operation then? I mean that if I understood correctly you want to display a user result page after a refresh happened, right?
You can bulk all operations and wait for the refresh to happen, no?
You do not need to make a refresh request. Just set wait_until on index operation and wait for refresh to happen( refresh is made by elasticsearch automatically and periodically).
Let me clarify. Assume that in response to requests of users we run a java method which performs some tasks including inserting/updating/deleting into elasticsearch and sends back a response to the user. We want to send the response when all of the changes are visible to further requests. We have some choices:
- Putting ?refresh=wait_for on every request (RefreshPolicy=WAIT_UNTIL in Java API): this can make the application very slow. For example, if 10 requests are sent to elasticsearch, we may need to wait about 10 seconds (1 second for each request).
- We can collect all the insert/update/delete requests and send them as a bulk request to elasticsearch and set refresh policy=WAIT_UNTIL on the whole bulk request. Although this solution may be good if all of the requests are sent in a single method, if they are scattered in different places of code (e.g., the initial method calls another method in another class which per se calls other methods in other classes and so on, and in each of these methods we may need to perform some insert/update/delete), this solution may not be easy to implements an can complicate the code a lot.
- After we performed all of the insert/update/deletes and when we want to send back the response to the user in the initial Java method, we send a RefreshRequest to elasticsearch so that it performs a refresh. This may cause some overheads and reduce performance if there are many user requests per second.
- The last choice which is my suggestion but it seems that elasticsearch does not support it is to send a request to elasticsearch which does not force a refresh but just waits until a refresh is done and then returns (similar to setting refresh policy = WAIT_UNTIL on an update/insert/delete request, but here there is no insert/update/delete request and we just want to wait for a refresh). This may be implemented, e.g., in the form of an option in RefreshRequest that specified that we do not want to force a refresh but instead we want to wait for the next refresh. In the example scenario, the wait-for-refresh request can be send to elasticsearch inside the initial method after all of the inserts/updates/deletes are performed and right before sending the response back to the user.
I hope it's clear now.
It is clear, and I can't think of a way to do option 4 within Elasticsearch today. Option 2 sounds like the preferred solution. Option 4 would result in lots of small indexing requests, but Elasticsearch performs better when receiving larger bulk requests. Thus if performance is a concern it's probably best to collect your requests into fewer, larger bulks on the client, and that effectively means doing most of the plumbing needed to do option 2 anyway.
Isn't it better to allow the developer to choose if he wants convenience or maximum performance?
Lots of small writes and updates can be a lot slower than using bulk requests. If you want to just evaluate the performance of option 4 and compare it to other options, why not index a single small document with the WAIT_FOR flag set at the end of the transaction? This will result in a lot of small artificial documents in the index though, but for a performance test that may be fine. You could also periodically clean them up. Make sure you do not update the same document though as that can cause a refresh to happen.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.