I am sure I'm being dense, but the following code using the Java API works...sometimes. However there are cases where it is unable to delete the document. I am then able to use the REST interface to delete the document just fine. I don't have any non-default routing setup on the index, and in either case, I'm not really specifying anything different using the REST call than through the Java API.
Java:
for (SearchHit hit : searchResp.getHits().getHits()) {
...//whole bunch of code that checks whether this is a document that needs to be deleted.
DeleteResponse response = this.client.prepareDelete(index, hit.getType(), hit.getId()).get();
boolean success = response.isFound();
if (!success){
//please help me figure out why I get here. I know the document is there because I just found it.
//output in a format so that I can easily drop it into a REST request, but it makes no sense.
}
....scroll API stuff...
}
The REST request that succeeds no problem:
curl -XDELETE 'https://my_cluster_address:9243/rstatus/_query'
{
"query": {
"filtered": {
"filter": {
"terms": {
"_id": [
"doc_id_copied_and_pasted_from_java_output",
"2nd_failed_doc_id..."
]
}
}
}
}
}
response.isFound() returns, if the document was found that should be deleted. So if it was not, it means that you maybe created the document in between, or that you maybe misread the delete REST response.
Can you show the output of a delete rest response of one of the documents that dont match your success boolean?
Alex,
Thank you for the response.
I can't show any output from the REST now because I've cleaned up the index manually and I didn't save any, but the REST command returned that it found the document and deleted it. When I searched for it again, it was gone.
However, I can tell you that I shut down all applications using the index and the bit of Java I was running is single threaded. I use a scroll search to look through all of the documents in the index, do some checking to see if they need to be deleted, and if so, do the deletion. There are no changes to the index from when the scroll search is run and the program starts looping through the results. The documents are deleted in the same thread of execution.
Are they unable to be deleted because they are currently part of a results set that I am scrolling through? Should I be saving off the list of ID's to delete and then do that operation separately? That wouldn't explain why some other documents were able to be deleted while I was still going through the original search results.
If necessary I'll add some invalid documents and re-run, but I'd prefer not to...I'd rather hear that there is a different best practice for deleting documents from an index...but it doesn't sound like that is the case.
a scroll search is just a point in time view (from the moment the search is initiated), which means you would not see, if another component deleted a document in between that was found by the scroll search.
That said, if everything is serialized and not done in threads (no indexing/deleting) while the scroll search is running - this should not happen.
Is there any way to reliably reproduce this for you? Does it happen everytime?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.