RefreshRequest with WAIT_UNTIL mode

shayantabrizi · June 18, 2019, 4:31am

Assume that I do several updates/inserts/deletes to the elasticsearch in response to a user request in different parts of my code and I want to make sure they are visible to the user before I return a success status to the user. setting a refresh policy of WAIT_UNTIL on each of the changes may cause considerable delay in the response because they run sequentially. Another solution is to send a RefreshRequest once all the changes are done. But that may cause overhead to the system. A better solution may be to just wait until the next refresh and then send response to the user. However, I did not find a clean way to just wait until the next refresh. It may be a good idea to add an option to RefreshRequest to behave like WAIT_UNTIL and do not force a refresh.

wangqinghuan · June 18, 2019, 5:43am

?refresh=wait_for will wait for the changes to be made visible before replying to client. It don't force refresh immediately, rather, it waits for refresh to happen. Default refresh interval is 1s.

shayantabrizi · June 18, 2019, 5:50am

As I know, wait for can be used for the insert/update/delete requests. As I said, if I use it for all the changes it can cause 10 changes to e.g., take about 10 seconds. An alternative is to not use wait_for and refresh only once when all changes are performed. However, this can cause a refresh per user request. A better solution may be to have some mechanism for waiting until the next refresh (without doing any changes). A good option may be to add an option to RefreshRequest that only waits and does not start a real refresh.

dadoonet · June 18, 2019, 6:08am

What wrote @wangqinghuan is true. This is the option you want IMO.

shayantabrizi · June 18, 2019, 6:09am

So RefreshRequest has a wait_for option?

dadoonet · June 18, 2019, 6:36am

That's the WAIT_UNTIL option. See

github.com

elastic/elasticsearch/blob/master/client/rest-high-level/src/main/java/org/elasticsearch/client/security/RefreshPolicy.java

/*
 * Licensed to Elasticsearch under one or more contributor
 * license agreements. See the NOTICE file distributed with
 * this work for additional information regarding copyright
 * ownership. Elasticsearch licenses this file to you under
 * the Apache License, Version 2.0 (the "License"); you may
 * not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 */

package org.elasticsearch.client.security;

This file has been truncated. show original

shayantabrizi · June 18, 2019, 6:38am

I KNOW that there is a WAIT_UNTIL option for changes.

I want it on a refresh request.

dadoonet · June 18, 2019, 6:52am

Oh. I see.

Why not on index operation then? I mean that if I understood correctly you want to display a user result page after a refresh happened, right?
You can bulk all operations and wait for the refresh to happen, no?

wangqinghuan · June 18, 2019, 7:03am

You do not need to make a refresh request. Just set wait_until on index operation and wait for refresh to happen( refresh is made by elasticsearch automatically and periodically).

shayantabrizi · June 24, 2019, 10:16am

Let me clarify. Assume that in response to requests of users we run a java method which performs some tasks including inserting/updating/deleting into elasticsearch and sends back a response to the user. We want to send the response when all of the changes are visible to further requests. We have some choices:

Putting ?refresh=wait_for on every request (RefreshPolicy=WAIT_UNTIL in Java API): this can make the application very slow. For example, if 10 requests are sent to elasticsearch, we may need to wait about 10 seconds (1 second for each request).
We can collect all the insert/update/delete requests and send them as a bulk request to elasticsearch and set refresh policy=WAIT_UNTIL on the whole bulk request. Although this solution may be good if all of the requests are sent in a single method, if they are scattered in different places of code (e.g., the initial method calls another method in another class which per se calls other methods in other classes and so on, and in each of these methods we may need to perform some insert/update/delete), this solution may not be easy to implements an can complicate the code a lot.
After we performed all of the insert/update/deletes and when we want to send back the response to the user in the initial Java method, we send a RefreshRequest to elasticsearch so that it performs a refresh. This may cause some overheads and reduce performance if there are many user requests per second.
The last choice which is my suggestion but it seems that elasticsearch does not support it is to send a request to elasticsearch which does not force a refresh but just waits until a refresh is done and then returns (similar to setting refresh policy = WAIT_UNTIL on an update/insert/delete request, but here there is no insert/update/delete request and we just want to wait for a refresh). This may be implemented, e.g., in the form of an option in RefreshRequest that specified that we do not want to force a refresh but instead we want to wait for the next refresh. In the example scenario, the wait-for-refresh request can be send to elasticsearch inside the initial method after all of the inserts/updates/deletes are performed and right before sending the response back to the user.

I hope it's clear now.

DavidTurner · June 24, 2019, 8:57pm

It is clear, and I can't think of a way to do option 4 within Elasticsearch today. Option 2 sounds like the preferred solution. Option 4 would result in lots of small indexing requests, but Elasticsearch performs better when receiving larger bulk requests. Thus if performance is a concern it's probably best to collect your requests into fewer, larger bulks on the client, and that effectively means doing most of the plumbing needed to do option 2 anyway.

shayantabrizi · July 2, 2019, 5:50am

Isn't it better to allow the developer to choose if he wants convenience or maximum performance?

Christian_Dahlqvist · July 2, 2019, 6:01am

Lots of small writes and updates can be a lot slower than using bulk requests. If you want to just evaluate the performance of option 4 and compare it to other options, why not index a single small document with the WAIT_FOR flag set at the end of the transaction? This will result in a lot of small artificial documents in the index though, but for a performance test that may be fine. You could also periodically clean them up. Make sure you do not update the same document though as that can cause a refresh to happen.

system · July 30, 2019, 6:01am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
`refresh=wait_for` taking unexpectedly long Elasticsearch	5	4487	May 25, 2017
BulkProcessor using BulkRequest with wait_for Elasticsearch	3	457	April 22, 2021
Understand async and wait_for mode Elasticsearch	3	209	April 14, 2022
Wait_for not working as expected Elasticsearch	3	373	August 4, 2021
How to wait for changes to be comitted? Elasticsearch	2	391	July 6, 2017

RefreshRequest with WAIT_UNTIL mode

Related topics