ILM - Delete beat data after 30 days

Hello Elastic

We are ingesting a lot of data into Elastic.
We are using index templates.
But our servers are filling up. I want to automate the cleaning of old data, instead of having to manually watch when they're filling up, and then manually delete old indices.
I want to delete data after 30 days.
I think this is done via ILM.

I have been reading the docs about ILM and Index Management for 5 hours now, but it is sounding like this: Technical Jargon Overload - YouTube and the tutorials feel like this: https://i.imgur.com/d0dly15.png
It feels like the docs assume I am already an elastic expert, and know how every intricacy of how the system works, other than what I am currently reading about. It also seems like the docs assume that we use Data Streams, or that we haven't ingested any data yet. And if we have, then it assumes we know how to do a "re-indexing", which sounds very complicated.

I simply want to delete old data. It all feels very complicated for something that seems like it should be simple.

I have an "ILM" policy called "metricbeat", which only has a hot phase with defaults and "Delete data after this phase" selected.

I have a Legacy index template called "metricbeat-7.13.0" with index pattern "metricbeat-7.13.0-*" and ILM policy "metricbeat".

The same is the case for filebeat.

Why do I still have indexes that are more than 30 days old? Shouldn't they be deleted by the policy? How do I debug this and how do I fix it?
For example, why does the 09.30 index still exist, even though it is older than 30 days?


Screenshot is from my Vagrant test environment.

I am getting the follow stack trace when looking at indices with lifecycle errors under Index Management in Kibana.

java.lang.IllegalArgumentException: index.lifecycle.rollover_alias [metricbeat-7.13.0] does not point to index [metricbeat-7.13.0-2021.10.29]
	at org.elasticsearch.xpack.core.ilm.WaitForRolloverReadyStep.evaluateCondition(WaitForRolloverReadyStep.java:126)
	at org.elasticsearch.xpack.ilm.IndexLifecycleRunner.runPeriodicStep(IndexLifecycleRunner.java:176)
	at org.elasticsearch.xpack.ilm.IndexLifecycleService.triggerPolicies(IndexLifecycleService.java:333)
	at org.elasticsearch.xpack.ilm.IndexLifecycleService.triggered(IndexLifecycleService.java:271)
	at org.elasticsearch.xpack.core.scheduler.SchedulerEngine.notifyListeners(SchedulerEngine.java:184)
	at org.elasticsearch.xpack.core.scheduler.SchedulerEngine$ActiveSchedule.run(SchedulerEngine.java:217)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
	at java.base/java.lang.Thread.run(Thread.java:831)

Most of the official Documentation seems to be talking about "Data Streams". But we are using Index Templates. How do I make sure data gets deleted after 30 days, so our servers Elastic servers don't get too filled with data?

The beats are going through Logstash.

The Logstash output is looking like this:

output {
  if [@metadata][pipeline] {
    elasticsearch {
      ecs_compatibility => "v1"
      pipeline          => "%{[@metadata][pipeline]}"
      ssl               => true
      cacert            => "/etc/pki/ca-trust/source/anchors/ca.crt"
      hosts             => [ "es01.sanitized:9200", "es02.sanitized:9200" ]
      index             => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
      user              => "system_logstash_test"
      password          => "sanitized"
      ilm_enabled => "true"
      ilm_pattern => "{now/d}-000001"
      ilm_policy => "%{[@metadata][beat]}"
      ilm_rollover_alias => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
      manage_template => false
    }
  } else {
    elasticsearch {
      ecs_compatibility => "v1"
      ssl               => true
      cacert            => "/etc/pki/ca-trust/source/anchors/_ca.crt"
      hosts             => [ "es01.sanitized:9200", "es02.sanitized:9200" ]
      index             => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
      user              => "system_logstash_test"
      password          => "sanitized"
      ilm_enabled => "true"
      ilm_pattern => "{now/d}-000001"
      ilm_policy => "%{[@metadata][beat]}"
      ilm_rollover_alias => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
      manage_template => false
    }
  }
}

Trying to add the Policy I get the following warning. What does it mean? Do I really have to manually add an alias for every single day and beat that I want to enroll in ILM?

You are sending data to "old style" index names with [quote="maltewhiite, post:1, topic:288992"]
%{+YYYY.MM.dd}
[/quote] so a new index is created daily, those never go thru "rollover".

Did you see in the doc about creating an index alias? I think you are ready for these steps:

  1. Bootstrap the initial ILM index and create the alias. Your index name would be "filebeat-7.13.0".
  2. Change your output statement target index removing the "-%{+YYYY.MM.dd}".

The new "filebeat-7.13.0-00001" index should start receiving data. It won't roll over until the ILM policy causes it and then ILM should be happy.

There is a tool for managing the old style indices called curator here: GitHub - elastic/curator: Curator: Tending your Elasticsearch indices

1 Like

One more thing, I think you will have to bootstrap any possible index you will use. Beats has some features for automatically creating ILM templates and initial indices, but I was never able to get it to work then variables are used as you are with "%{[@metadata][beat]}-%{[@metadata][version]}".

It was a few versions back when I last worked on the issue, maybe it's been enhanced since then. Just watch for it next time you upgrade filebeat.

1 Like

Create what alias exactly? And how?

Do I change the logstash output file? If so, do I change the "ilm_rollover_alias" or the "index"?

Will all the old indices be enrolled into this new ILM policy if I "Bootstrap" the new index? Like, will the large indexes and the old ones, be deleted?

Thanks a lot for helping out by the way. This is all very confusing to me. I just want to delete old data, but all this stuff with "alias" and "boostrap" and "templates" and "policy" etc I don't understand at all.

I wish there was some Elastic for beginners somewhere. But all the documentation is packed with technical jargon. And if I go to learn about a word I don't know, I just end up down a rabbit-hole that leads to even more technical words I don't know.

Like this?
image

I am getting an error

I searched for all indices with "000" and deleted them.

Then I changed my logstash config to:

output {
  if [@metadata][pipeline] {
    elasticsearch {
      ecs_compatibility => "v1"
      pipeline          => "%{[@metadata][pipeline]}"
      ssl               => true
      cacert            => "/etc/pki/ca-trust/source/anchors/dap_ca.crt"
      hosts             => [ "es01.sanitized:9200", "es02.sanitized:9200" ]
      index             => "%{[@metadata][beat]}-%{[@metadata][version]}"
      user              => "system_logstash_test"
      password          => "sanitized"
      ilm_enabled => "true"
      ilm_pattern => "000001"
      ilm_policy => "%{[@metadata][beat]}"
      ilm_rollover_alias => "%{[@metadata][beat]}-%{[@metadata][version]}"
      manage_template => false
    }
  } else {
    elasticsearch {
      ecs_compatibility => "v1"
      ssl               => true
      cacert            => "/etc/pki/ca-trust/source/anchors/dap_ca.crt"
      hosts             => [ "es01.vagrant.dap.cfcs.dk:9200", "es02.vagrant.dap.cfcs.dk:9200" ]
      index             => "%{[@metadata][beat]}-%{[@metadata][version]}"
      user              => "system_logstash_test"
      password          => "sanitized"
      ilm_enabled => "true"
      ilm_pattern => "000001"
      ilm_policy => "%{[@metadata][beat]}"
      ilm_rollover_alias => "%{[@metadata][beat]}-%{[@metadata][version]}"
      manage_template => false
    }
  }
}

And now it seems like it is working.

Used the following commands in Dev Tools to "Bootstrap" it.

Now the data is flowing in my Vagrant setup.

The Index Templates are like this:

Time to head home. Tomorrow, if everything has been working well overnight in Vagrant, then test in Dev environment.

Thanks a lot for the pointers rugeni. Hopefully what I have now is working.

I think this implies that when ILM is enabled in the output section, "index" is ignored and the ILM settings (rollover_alias) is used. This section of doc seems to be changing a lot recently. To me, the newer "data streams" adds more confusion, I haven't started investigating it yet. I'm retiring next year, I may not :slight_smile:

1 Like

Hope you have a nice retirement once you get there.

I have only been working for 6 months now. It is all very new to me. Never had anything about Log management in the CS school I went to. I only learned front-end stuff like React and Flutter. How to improve app performance and make nice UX and UI. We only made websites and mobile apps.

At my work I manage a LOT of different stuff, and elastic is just one of those things. I never have time to dive deep into something like this.
I am wholly unqualified for this.

My last question regarding this ILM is that now that my logs are hitting these new 00001 indices, what about all the old ones?
How do I make sure that all those old indicies also get deleted once they become old?
I am talking about these:

This is an example of an error:

If it's not production, I'd just delete them in Kibana index management. I don't think fixing their ILM problems would add to your useful knowledge. At that point in my history, I was using curator.

1 Like

Hello @maltewhiite

Sorry to hear that you had trouble configuring your ILM policy. Thanks @rugenl for your help.

Only new indices created after you have added the policy to your index template will have the lifecycle and will be automatically deleted (now it is -00001, in a month it will be -00002 etc.)
So, as rugenl said, the best would be to manually delete the old ones that you don't need anymore.

For the error that you are receiving about the alias, did you define the index.lifecycle.rollover_alias index setting in your template? It should match the alias that you have defined in beat, so from your config it should be

"index.lifecycle.rollover_alias": "metricbeat-7.13.0"

You can read more about it here: Tutorial: Automate rollover with ILM | Elasticsearch Guide [7.15] | Elastic

By adding this setting you are automatically defining an alias to the indices that will be created. So in a month, the "metricbeat-7.13.0-0000002" index that will be created will point to the "metricbeat-7.13.0" alias.

Let me know if you bump into more issues.

Thanks
Sébastien

1 Like

Another thing to consider for space constrained systems; delete after 30 days (usually?) means 30 days after rollover. If you rollover based on time, say 30 days, some of your data will 60 days old before it is deleted. I use shorter rollover intervals, maybe a week, so my data gets deleted somewhere between 31-37 days.

I added the "Usually" because I think there may be other options now that I don't use. Typically there is some policy to keep logs for a minimum data, so "after rollover" is what we always use.

1 Like

Thanks a lot for all your help.
Truly great to be working with a technology, where you can receive help like this.
Much appreciated!

Got it working in both Vagrant and Dev environment. Moving to production soon. Thanks again.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.