Silent failures with delete-by-query

I've examined all the questions on this subject. None seems to address the problem I'm having.

I need to loop through doing multiple delete_by_queries. As I've set things up for experimenting, just a handful. The problem I keep getting, intermittently, is with the field deleted in the response object.

I know for a fact that each query should result in deletion of multiple documents (Lucene documents) from the index: between 10 such LDocs and several hundred LDocs. Thus, deleted in the response should never be 0 in what follows:

    #[derive(Deserialize, Debug)]
	struct DeleteResponse {
	    took: usize,
        // in this field the ES server tells you how many LDocs have been deleted:
		deleted: usize, 
		total: usize,
		batches: usize,
        ...
    }
    let url = format!("{ES_URL}/{}/_delete_by_query?refresh=true", self.index_name);
    for _ in 0..x {
        ...
        let text_document = ... // obtain from map
        ...
		let data: serde_json::Value = json!({
            "query": {
			    "match": {
				    "text_ldoc_number": text_document.text_doc_ldoc_number,
		}}});
        // NB as stated, the above always matches between 10 and several hundred LDocs 

        // allowing the possibility of trying several times
		let delete_attempts = 5;
		for i in 0..delete_attempts {
		    let delete_response: DeleteResponse = reqwest_call!(&url, reqwest::Method::POST, body_str=&data.to_string())?;
		    if delete_response.deleted == 0 {
			    if i == delete_attempts - 1 {
				    return Err(anyhow!("deletion of LDocs with text_doc_ldoc_number {text_doc_ldoc_number} resulted in 0 LDocs being deleted after 5 attempts"))
			    }
                // not the final try: sleep thread for 20 ms
			    let wait_time_ms = time::Duration::from_millis(20);
			    thread::sleep(wait_time_ms);
		    }
		    else {
                // deleted == non-zero: this operation has worked OK: leave "tries" loop
		    	break
		    }
	    }
	}

As can be seen, I've tried tacking on "?refresh=true" to the URL. In fact this doesn't seem to make much difference.

Setting the millisecond value as above (20 ms) I find that the deleted value is sometimes 0 and sometimes not. Setting to a lower ms value tends to produce more 0 values. But the behaviour is very unstable: with some runs I can have no failures, and everything just works on the first try.

This sort of pragmatic setting of sleep ms value is obviously unsatisfactory. I want to find out the actual reasons behind these (silent) intermittent failures of my ES instruction. It appears ES may be needing time to "digest" these deletions before moving on to the next delete_by_query operation. Can I detect when this "digestion" has ended?

NB sometimes the very first delete_by_query operation fails on the first attempt. Occasionally the very first operation fails 5 times and thus raises the Err. So it appears that before running any delete_by_query a check needs to be made that the index and server are in a particular "settled/receptive" state...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.