Hi,
According the the documentation the delete by query plugin is built upon the scroll and bulk API's. The size of the hits returned by scroll is 10 (by default).
Can this size be increased?? The default setting is taking too long to delete the documents.
If not, is there a faster way to delete the documents?
I am using the following code:-
new DeleteByQueryRequestBuilder(esClient,DeleteByQueryAction.INSTANCE).setIndices(index).setQuery(deleteQuery)
.get()
Pre-5.0 the parameter is named size, which is confusing. It is documented here. 10 is a very, very small default size. In the transport client you'd do something like DeleteByQueryAction.newRequestBuilder(client).setSource(new SearchSourceBuilder().size(1000)).setIndices(index).setQuery(deleteQuery).get(). It is a bit funky to do, sorry!
We rewrote delete-by-query to use the same infrastructure as reindex and update by query in 5.0. The parameter is named scroll_size. In the transport client it is fairly similar to 2.x.
I tried the solution but the setSource method does not take SearchSourceBuilder as an argument. I had to apply toString on the SearchSourceBuilder to get it to work.
But my problem still persists.
I executed the delete query on 110k documents with the size 50k via curl and it could delete only 50k documents. It showed 60k failed.
The next time i hit the query it shows 60k were found but deletion failed.
I cannot the understand why it was able to delete at the first time but not on the second.
@nik9000
This did the trick for me:-
DeleteByQueryAction.INSTANCE.newRequestBuilder(client)
.setSource(new SearchSourceBuilder().query(filters).size(10000).toString).setIndices(queryIndex).get()
The deletion is completed within couple of seconds.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.