I have the following code for a deletion (ES 7.13):
final DeleteByQueryRequest delQreq =
new DeleteByQueryRequest(targetIndex);
final ESQueryFilter esq = getFilters(null);
final QueryBuilder qb = ...;
delQreq.setConflicts("proceed");
delQreq.setQuery(qb);
final BulkByScrollResponse bulkResponse =
getClient().deleteByQuery(delQreq, RequestOptions.DEFAULT);
I am trying to figure out how delay the return from that last call until the index is updated, that is to do the same as I can for indexing a document - e.g.
refresh is supported by the Index, Update, Delete and Bulk apis. One of the options is wait_for which I assume is the equivalent of the Java API WAIT_UNTIL refresh option.
So I think you're saying the deleteByQuery is a reindex operation which doesn't support WAIT_UNTIL.
Is there any way to achieve this?
I could do the query and delete each entity with the delete api but I don't want to wait for each deletion - Would waiting for the last have the desired effect - i.e will I see all deletions if I wait for the last?
Delete by query iterates the search API and executes the bulk API, yes. It could set refresh=wait_for but I never implemented that because I figured most of these index many results and it isn't generally worth it in that case.
If it'd be super useful to you then open an issue and explain it. I don't think it's super hard but it'd be very slow if there are many bulk writes.
If you don't perform the delete by query very frequently then you can just set refresh to true and not worry about the tiny segments you end up with. If the delete by query is very large you can set refresh to true and because the delete is already quite expensive compared to the refresh.
If the delete by query is fairly small and fairly frequent then you shouldn't set refresh to true. And you can't set it to wait_for. If that's you it's worth opening an issue.
If you wanted to do the whole scroll/bulk delete thing yourself you could. We did it by hand back in the old days. If you put refresh=wait_for on the bulk requests it'd wait for each one to become visible.
But if you have many bulk requests you'd be waiting a while after each one. It isn't safe to just set refresh=wait_for on the last one. It'll mostly work, but bulk only waits on the shards it touches. If the last bulk doesn't include a shard then we won't wait for it. OTOH if you have a couple of thousand element bulks you may as well perform an explicit refresh after they are all done. And that is what delete by query does when you set refresh to true.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.