Slow deletes

erikstephens · March 28, 2016, 3:02pm

Using version 2.1.2, using the delete-by-query api to delete records based on their type, it's been slowly deleting documents for past couple of days at roughly 1-2 million documents per day. Not seeing anything in the logs. Cluster load is ok. Any ideas how to speed an operation like that up or where to look for details on why it it's progressing so slowly? Also, if I submit multiples of the same delete query, will that effectively parallelize it or will they end up competing with eachother? Thanks!

nik9000 · March 28, 2016, 4:32pm

Have you seen it be faster in the past with the delete-by-query plugin?

The delete-by-query plugin and they reindex API coming in 2.3/5.0 are single threaded things built on top of the scroll API. They aren't designed for speed so much as simplicity and stability.

You are generally better figuring some way to delete whole indexes at a time rather than by query. That being said, you can parallelize the process by launching the API more than once with constraints on it that make sure it doesn't overlap. Like "delete all documents with this tag" and "delete all documents with this other tag", etc. If they overlap then they are unlikely to get any speed, though they will probably still work.

erikstephens · March 28, 2016, 5:05pm

It seemed faster in other cases, but was only anecdotal and was deleting an order of magnitude fewer documents. Thanks for the feedback. More & more, it seems like the value of multiple types in an index is diminishing. Wondering if I'd be better off with more type/process specific indexes (eg syslog-YEAR.MONTH, appA-YEAR.MONTH, appB-YEAR.MONTH) instead of more general indexes (eg logstash-YEAR.MONTH.DAY). A bit off-topic, but I'd be interested in hearing if anyone else made a similar transition and how that worked out for them.

magnusbaeck · March 31, 2016, 7:21am

Wondering if I'd be better off with more type/process specific indexes (eg syslog-YEAR.MONTH, appA-YEAR.MONTH, appB-YEAR.MONTH) instead of more general indexes (eg logstash-YEAR.MONTH.DAY).

If you have different retention needs for different kinds of logs then that's definitely the way to go. If you frequently find yourself making delete-by-query requests you're probably doing something wrong.

erikstephens · March 31, 2016, 1:45pm

I definitely did something wrong An error with my configs caused some documents to get indexed incorrectly. In my example, I increased the index retention to keep the number of open indexes about the same - assumes we'd have fewer than 30 type specific indexes.

Topic		Replies	Views
Delete by query deletes only 1000 documents, then quits Elasticsearch	8	274	February 5, 2024
Elasticsearch Delete Index Performance Elasticsearch	5	2867	August 28, 2020
Is Delete by query a clean operation? Elasticsearch	10	482	March 1, 2022
Slowness in Elasticsearch DELETE and UNLOAD operation Elasticsearch	10	1176	February 2, 2017
ES 2.3.2 Delete by Query increasing "Size" parameter is not helping Elasticsearch	11	3549	January 17, 2017

Slow deletes

Related topics