Use of wait_until for deleteByQuery in java rest client

I have the following code for a deletion (ES 7.13):

    final DeleteByQueryRequest delQreq =
            new DeleteByQueryRequest(targetIndex);

    final ESQueryFilter esq = getFilters(null);

    final QueryBuilder qb = ...;

    delQreq.setConflicts("proceed");
    delQreq.setQuery(qb);

   final BulkByScrollResponse bulkResponse =
              getClient().deleteByQuery(delQreq, RequestOptions.DEFAULT);

I am trying to figure out how delay the return from that last call until the index is updated, that is to do the same as I can for indexing a document - e.g.

      req.setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL);

While the documents seem to suggest that it's possible for deleteByQuery I can't find any examples or methods.

How can I accomplish this?

It isn't possible to wait_until on any of the reindex-like requests in any client. ES doesn't support it.

According to this: https://www.elastic.co/guide/en/elasticsearch/reference/7.14/docs-refresh.html

refresh is supported by the Index, Update, Delete and Bulk apis. One of the options is wait_for which I assume is the equivalent of the Java API WAIT_UNTIL refresh option.

Isn't the code above just using the bulk api?

So I think you're saying the deleteByQuery is a reindex operation which doesn't support WAIT_UNTIL.

Is there any way to achieve this?

I could do the query and delete each entity with the delete api but I don't want to wait for each deletion - Would waiting for the last have the desired effect - i.e will I see all deletions if I wait for the last?

Delete by query iterates the search API and executes the bulk API, yes. It could set refresh=wait_for but I never implemented that because I figured most of these index many results and it isn't generally worth it in that case.

If it'd be super useful to you then open an issue and explain it. I don't think it's super hard but it'd be very slow if there are many bulk writes.

It would be useful - in my case there aren't many results - it's calendar data and there might be a number of instances for recurring events.

I'll open an issue and try to figure outa workround in the meantime

Thanks.

If you don't perform the delete by query very frequently then you can just set refresh to true and not worry about the tiny segments you end up with. If the delete by query is very large you can set refresh to true and because the delete is already quite expensive compared to the refresh.

If the delete by query is fairly small and fairly frequent then you shouldn't set refresh to true. And you can't set it to wait_for. If that's you it's worth opening an issue.

If you wanted to do the whole scroll/bulk delete thing yourself you could. We did it by hand back in the old days. If you put refresh=wait_for on the bulk requests it'd wait for each one to become visible.

But if you have many bulk requests you'd be waiting a while after each one. It isn't safe to just set refresh=wait_for on the last one. It'll mostly work, but bulk only waits on the shards it touches. If the last bulk doesn't include a shard then we won't wait for it. OTOH if you have a couple of thousand element bulks you may as well perform an explicit refresh after they are all done. And that is what delete by query does when you set refresh to true.

Thanks for the info.

Yes I'm the small and fairly frequent case. I'll open an issue.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.