Delete by query , refresh=wait_for support?


(Sergey Kozachenko) #1

When I call delete_by_query with ?refresh=wait_for, got this warning:

_delete_by_query/?refresh=wait_for] returned 1 warnings: [299 Elasticsearch-5.4.1-2cfe0df "Expected a boolean [true/false] for request parameter [refresh] but got [wait_for]

Is it supported?, doc's says it is.

And when I call delete_by_query with ?retry_on_conflict=3
It fails - invalid parameter.

I wanted to use these since when I delete records I got them rejected b/c of version conflicts. I read about ?conflicts=proceed but I am not sure if it will cause deleted records to remain. That's why I wanted to add retry first and then wait for.


(Xavier Facq) #2

Hi,

What version are you running

Doc : https://www.elastic.co/guide/en/elasticsearch/reference/6.4/docs-delete-by-query.html


(Sergey Kozachenko) #3

Elasticsearch-5.4.1

Here it says that delete support refresh param:
https://www.elastic.co/guide/en/elasticsearch/reference/5.4/docs-refresh.html

Does it mean that ?refresh=wait_for supported in update but not in delete?

And why delete doesn't support ?retry_on_conflict=3 ?

I see when it fails with version conflict it has:

retries: {bulk: 0, search: 0} meaning it failed on first try, but in doc's from your link it says it should try up to 10 times.


(Xavier Facq) #4

I think that there is a misunderstanding with the doc:

https://www.elastic.co/guide/en/elasticsearch/reference/5.4/docs-delete-by-query.html

In addition to the standard parameters like  `pretty` , the Delete By Query API also supports  `refresh` ,  
`wait_for_completion` ,  `wait_for_active_shards` , and  `timeout` .

For me, the 3 first parameters are boolean : &refresh=true&wait_for_completion=true...


(Sergey Kozachenko) #5

I see, it looks strange that in this part doc says ?refresh has 3 values (true, false, wait_for)
And for delete it has specific param - ?wait_for_completion=true which I believe same as
?refresh=wait_for.

Anyhow, I have another questions: ?retry_on_conflict=3 param which doesn't work for delete_by_query but works for update.

Also it's not clear why it doesn't do re-try by default as it says in docs for delete_by_query:

_delete_by_query relies on a default policy to retry rejected requests (up to 10 times, with exponential back off)

--Thanks.


(Xavier Facq) #6

IMO retry_on_conflict is only used when updating a document, in this case, a conflict of version may occure. When you delete a doc, there is no conflic "possible", the doc is deleted.

In the response, "retries" is when the initial request failed (cluster error, timeout, etc..), this is not a version conflict.

bye,
Xavier


(Nik Everett) #7

I can answer this because most of it is my fault!

retry_on_conflicts isn't supported by delete-by-query and update-by-query because we don't have a mechanism to re-check that the document matches the query after the conflict. It may not. Your only option is to ignore_conflicts and redo the entire request if there are any conflicts. Delete-by-query at least will be much smaller the second time around.

Not supporting wait_for is my fault on both counts. I made wait_for and I made the reindex infrastructure that powers delete-by-query and I didn't link them because it is hard. wait_for hooks pretty deep into the shard to know when a refresh occurs. But delete-by-query functions at a much higher level. It could stick use refresh=wait_for on every bulk request that it sends but that'd slow down every bulk request while it waits for a refresh for each one. Delete-by-query can't hook into the wait_for infrastructure in any other way. So it doesn't. But the docs don't reflect that. I'll fix that.


(Sergey Kozachenko) #8

Hi Nik,
Thank you for your answer.

But it still not clear for me, what should I do when version conflict arise during delete_by
query?

What I did is that I have added _delete_by_query/?refresh=wait_for and it print warnings but version conflicts doesn't occur anymore, so is it actually works ??

Or, it interprets not empty strings as true refresh=true ??

What happens if I use conflicts=proceed param - will it skip deleting because of conflicts and I will need to re-try manually ?

Thank you.


(Terry Quigley) #9

Hi

It isn't clear to me either as I've been having issues with ?refresh=wait_for not doing what I expected from the docs. I have also tried ?refresh=true with similar behaviour (i.e. it appears to do nothing). Is there something else I should try ?

Thanks


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.