My requirement is to delete a particular document (with doc_id) from multiple indices in one go. But the catch is that I am not aware which all indices would have this doc and I will have to fire a delete request for the doc_id to all indices present in the system.
When I tried to delete via bulk API by passing index and doc id, I was getting request too big error (status code: 429). The only alternative I could think of is to send a single delete request (DeleteByQuery) containing all indices which have to be considered.
My query is that whether DeleteByQuery is costlier than Delete by Doc ID and Index when it touches all the indices present?
Is there any alternative approach I could give a shot?
Assuminh that mean 100,000 documents per index it is still not necessarily a lot of data for an index unless the documents are massive. I would generally aim to have a shard size of 10GB to 50GB. What is your average shard size?
The shard size is around 1 GB. The current index design was required for a particular use case and all optimisations are 'work-in-progress'.
The main issue that i am facing is 'Request too Big' - Status code 429. I was looking into ways to reduce the no of delete requests sent as part of bulk request or sent a deleteByQuery instead having the query pointing to the doc id.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.