Hi,
we have an index with 50 Million docs, and from time to time we would like to purge some of it's data.
in order to do that, we would like to use "delete by" query.
I was trying to understand what are the limitation of delete by query - meaning, is there a recommended threshold for the amount of documents that should be purged?
as out filter, in some cases, can potentially lead to a deletion of 10 million documents.
is that a reasonable amount to delete? or it is recommended to work with some smaller bulks?
delete-by-query will delete documents in batches, so can handle a large amount of docs in one go.
The primary potential drawbacks of doing a large amount of docs in one go are:
The underlying scroll search will retain resources on the nodes containing the shard copy used for the search (one per shard). It will prevent deleting the old segments necessary to provide a snapshot of data until done.
If a data node or the node running delete-by-query is restarted, the process will fail.
10 million dodcs does sound reasonable to do in one go, though your mileage may vary depending on hardware and other activity.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.