When I call delete_by_query with ?refresh=wait_for, got this warning:
_delete_by_query/?refresh=wait_for] returned 1 warnings: [299 Elasticsearch-5.4.1-2cfe0df "Expected a boolean [true/false] for request parameter [refresh] but got [wait_for]
Is it supported?, doc's says it is.
And when I call delete_by_query with ?retry_on_conflict=3
It fails - invalid parameter.
I wanted to use these since when I delete records I got them rejected b/c of version conflicts. I read about ?conflicts=proceed but I am not sure if it will cause deleted records to remain. That's why I wanted to add retry first and then wait for.
In addition to the standard parameters like `pretty` , the Delete By Query API also supports `refresh` ,
`wait_for_completion` , `wait_for_active_shards` , and `timeout` .
For me, the 3 first parameters are boolean : &refresh=true&wait_for_completion=true...
I see, it looks strange that in this part doc says ?refresh has 3 values (true, false, wait_for)
And for delete it has specific param - ?wait_for_completion=true which I believe same as ?refresh=wait_for.
Anyhow, I have another questions: ?retry_on_conflict=3 param which doesn't work for delete_by_query but works for update.
Also it's not clear why it doesn't do re-try by default as it says in docs for delete_by_query:
_delete_by_query relies on a default policy to retry rejected requests (up to 10 times, with exponential back off)
IMO retry_on_conflict is only used when updating a document, in this case, a conflict of version may occure. When you delete a doc, there is no conflic "possible", the doc is deleted.
In the response, "retries" is when the initial request failed (cluster error, timeout, etc..), this is not a version conflict.
retry_on_conflicts isn't supported by delete-by-query and update-by-query because we don't have a mechanism to re-check that the document matches the query after the conflict. It may not. Your only option is to ignore_conflicts and redo the entire request if there are any conflicts. Delete-by-query at least will be much smaller the second time around.
Not supporting wait_for is my fault on both counts. I made wait_for and I made the reindex infrastructure that powers delete-by-query and I didn't link them because it is hard. wait_for hooks pretty deep into the shard to know when a refresh occurs. But delete-by-query functions at a much higher level. It could stick use refresh=wait_for on every bulk request that it sends but that'd slow down every bulk request while it waits for a refresh for each one. Delete-by-query can't hook into the wait_for infrastructure in any other way. So it doesn't. But the docs don't reflect that. I'll fix that.
But it still not clear for me, what should I do when version conflict arise during delete_by
query?
What I did is that I have added _delete_by_query/?refresh=wait_for and it print warnings but version conflicts doesn't occur anymore, so is it actually works ??
Or, it interprets not empty strings as true refresh=true ??
What happens if I use conflicts=proceed param - will it skip deleting because of conflicts and I will need to re-try manually ?
It isn't clear to me either as I've been having issues with ?refresh=wait_for not doing what I expected from the docs. I have also tried ?refresh=true with similar behaviour (i.e. it appears to do nothing). Is there something else I should try ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.