Best way to delete old data from elastic search

prashanthtg · March 14, 2025, 10:50pm

I want to maintain the lifecycle management of an Elasticsearch doc over x days. _delete_by_query api help with the job, however i understand this also comes with some performance impact on query/read on delete of large data set, add segment merging to it. What is the best way to accomplish this. Would rollover be a better option here ? Incase i have to maintain the state of the doc for 90days or over, would reading from multiple indexes, over large data and aggregation have not impact.

_delete_query_api
Index State Management

Christian_Dahlqvist · March 15, 2025, 7:04pm

This is what the corresponding functionality to ILM is called in OpenSearch. Is that what you are using?

It would help if you describe the use case. Is your data immutable or do you perform updates? Is it time-series data?

prashanthtg · March 16, 2025, 6:24am

It’s not timeseries data, it is an entity state we have to maintain for 45 days after which we have to auto purge the record. We have records in half billion and we receive updates or to say upserts. ILM I understand is for timeseries data.

Christian_Dahlqvist · March 16, 2025, 7:51am

The most efficient way to delete data from Elasticsearch or OpenSearch is to delete complete indices. This does however require the use of time-based indices, which complicate performing updates.

The other option is as you pointed out to use delete by query. The reason this is more expensive and can cause performance issues is that it deletes individual documents from indices, which is basically an update operation with a tombstone record that requires both a read and a write.

If the purce date/time is based on the creation date of the document and you have access to this date/time outside Elasticsearch when you perform the insert/update, you may be able to use the older style of time-based indices where each index covers a specific set time period and that is indicated by the index name. When you index a document you would determine the name of the index to write to based on this static timestamp. You would then do the same whenever you update the document.

If the deleteion is not based on creation timestamp, e.g. instead the last updated date, or you do not have access to this when updating you will most likely need to take the hit and rely on delete-by-query instead, which you will have to call from outside Elasticsearch, e.g. through a script or cron job.

If you need to use delete-by-query, it may be worthwhile to try performing smaller deletes more frequently in order to spread out the load.

system · April 13, 2025, 7:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Delete old Elasticsearch Data Automatically Elasticsearch	2	386	February 15, 2021
Delete Documents From Large Index based on Timestamp Elasticsearch	7	1184	April 21, 2020
Alternative for Delete By Query or Solution for Performance improvement Elasticsearch	6	2138	June 4, 2021
Delete data older than 30 days Elasticsearch	1	806	July 4, 2024
Delete old data from Elasticsearch Elasticsearch	6	15065	January 9, 2017

Best way to delete old data from elastic search

Related topics