Delete_by_query performance optimization


Multiple topics mention time-based indices as a go to method for clearing old (index) data but in cases where that is not possible the alternative is delete_by_query which poses a performance impact if we are working with large number of documents.
In an example of clearing data for 100 indices each day (chron job) with each index storing 100K documents per day which amounts to 10M document to delete, what would be the performance impact to elasticsearch(new data is always coming in, data is queried but no updates are done on existing documents)?
Is it better to delete that data in one delete_by_query targeting the index pattern that covers all 100 indices or doing 100 delete_by_querys to delete each index data individually?
If it is better to do the 100 delete_by_querys rather then 1, is it better to spread them through the day or do them in one go?

Thank you!

If your data is immutable I would recommend going with time based indices as it is much more efficient. Why do you think you can not use them?

If you need to stick with DBQ I suspect you would be best off distributing the load by processing one or a few indices at a time. You will however test and see how much this affects your cluster.

1 Like

Thank you very much for the answers.
The are two main reasons for not using time based indices, the first on being that the shift to time based indices would take time to implement on other parts of the system which is unfortunately unacceptable at this time.
The second one is that the number of indices is already high and switching to time based indices would at least double them.
So the question is which of these would hinder the performance more, doubling(potentially tripling) the number of indices which are already high(in the thousands) or doing DBQ chron job deletes on millions of documents?

I do not know and suspect you will need to test to find out.

1 Like

Ok, thank you very much for the replies!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.