Can Rollover API / ILM be used to keep only x days data in an index at any point of time

Aditya1996 · August 22, 2023, 7:47pm

I have read a few threads regarding this question , Most of them suggest using DeleteBy Query as Rollover API seems to delete/move to other phase the indices and create the new ones based on given condition.

I could not find a way in which Rollover / ILM can work similar to the deleteBy Query which does as below

POST my-index/_delete_by_query?conflicts=proceed
{
 "query": {
  "range": {
   "@timestamp": {
    "lt": "now-90d/d"
   }
  }
 }
}

Christian_Dahlqvist · August 22, 2023, 8:01pm

ILM works by deleting complete indices. It can not be used to delete data from within an index. If you need to delete some data from within an index, delete-by-query is what you need. You will need to trigger and run these yourself though. There is no way to do so from within Elasticsearch.

Aditya1996 · August 23, 2023, 3:51pm

But usually delete by query isn't looked upon as an efficient solution.
@theuntergeek mentioned in the post What is the definitive way of only retaining 7 days of logs.

Christian_Dahlqvist · August 23, 2023, 3:52pm

No, that is true. Using delete-by-query is a lot less efficient compared to deleting time-based indices.

Aditya1996 · August 23, 2023, 3:52pm

He Mentioned " I highly recommend looking into the Rollover API for a way to simplify this for you. Then you can make your "non-time-series" index into a time-series index for all intents and purposes. "

I didn't quite understand how Rollover can help in this usecase

Christian_Dahlqvist · August 23, 2023, 3:53pm

Do you have immutable data or do you perform updates?

Aditya1996 · August 23, 2023, 3:53pm

Its a simple log index, so its a append only index ( no updates to the once indexed documents )

Christian_Dahlqvist · August 23, 2023, 3:54pm

In that case I would recommend you switch to using rollover and ILM, ideally through the use of data streams.

Aditya1996 · August 23, 2023, 3:56pm

Yes but again what Rollover does is that it deletes/moves the existing index to next phase based on a condition right and creates a new index as write index. In that way, a index would not have lets say 30 days past days of data at any time

For example lets say this is the policy below and the flow of indices in next image

Christian_Dahlqvist · August 23, 2023, 4:00pm

With rollover you have a set of indices that hold the data covering the retention period and you query all of them at the same time through an index pattern or alias. Once the oldest index only contains data that is beyond the retention period, it will be deleted. This means that you at any point in time may have a bit more data available that your retention period specifies, but that is generally not a problem.

I do not see what the issue is. This is how most people manage retention in Elasticsearch.

Aditya1996 · August 23, 2023, 4:11pm

So lets consider the policy and the flow of indices through the ILM ( seen in 2 images i sent in my previous post ) .
Lets assume I want the logs index to always keep last 2 days of data ( yesterday and today ) every time. Just like on Amazon at any time you can see last 3 months of orders that you did.

The hot phase has max age 2 days for rolling over to warm phase.

So on day 1 and day 2 , If i need to view the last 2 days data I have only 1 index/data stream under the alias name "logs_index" for example .

On day 3 , when rollover happens and new index is created the older one goes to warm phase.
Now if i need to view last 2 days data ( I would have to query new index and some part of old index )

Similarly on day 7 I would have to query index no. 4 and index no. 3 and so on based on this policy in the image in previous comment.

So how would Elasticsearch know which physical indices to query for to get exactly last 2 days data based on timestamps .

Christian_Dahlqvist · August 23, 2023, 4:27pm

If you are viewing your data through Kibana and set the time picker to last 2 days, Kibana will query all the indices backing the data stream/index pattern with a time filter added to the query. This means data will in practice only be returned from the last 2 indices. Querying a set of indices that does not hold any relevant data based on the time range is very fast so it is not a performance issue.

Aditya1996 · August 23, 2023, 4:53pm

No we don't use Kibana in production its done through Java Elasticsearch Rest client

Christian_Dahlqvist · August 23, 2023, 6:23pm

OK, then you query the full index pattern and add a timestamp range clause to your queries to filter out the correct data.

Aditya1996 · August 23, 2023, 6:44pm

Do you mean I cannot query using the alias name because under the alias only 1 active/write index would be there in hot phase and only that would be queried ... So in other words alias can only query the index in hot phase ?

Hence to query other indices as well I will need to use the regex name

system · September 20, 2023, 6:44pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can Rollover API / ILM be used to keep only last x days of data in the current index Elasticsearch	0	73	April 24, 2024
Keep data in a rolling window of 3 months Elasticsearch ilm-index-lifecycle-management	8	485	February 21, 2023
Question about "Rollover" in Index Lifecycle Policy Kibana ilm-index-lifecycle-management	9	1781	October 14, 2021
[HELP! ! ] About ILM (IndexLifecycleManagement) of ElasticSearch Elasticsearch ilm-index-lifecycle-management	11	606	April 6, 2023
Delete indices with 50 gb or older than Elasticsearch ilm-index-lifecycle-management	8	1632	November 3, 2022

Can Rollover API / ILM be used to keep only x days data in an index at any point of time

Related topics