Elasticsearch disk keep getting full when using data-stream

This morning I woke up to this:
Screenshot 2022-11-18 at 09.34.06

This is not the first time that it happens, and when it happens it blocks everything: I cannot access Kibana, Elasticsearch doesn't answer at all, this means I also cannot delete data using the API nor scale the instance to give it more disk space.
My only fix is to usually destroy the whole deployment and start over ...

My current set-up is the following:
I have 2 logstash sending logs from 2 kubernetes cluster (using filebeat). The logstash output is the following:

output {
      elasticsearch {
        hosts => ["<elastic search url>"]
        user => '<user>'
        password => '<password>'
        data_stream_namespace => 'production'

They both write to a data-stream. In kibana I see that this data stream is linked to the ILM for logs. By default, this ILM keep everything forever. I change this policy to move the data to a cold storage instance after 2 days and delete it after seven days.

My issue is that this ILM doesn't seem to work / be respected. This is my cold storage instance after more than 2 days:
Screenshot 2022-11-18 at 09.42.49

No data seem to be written to it. In the same way, I used a ILM before were data should be deleted after 2 days and after 2 days, no data seem to have been deleted in the same way.

Therefor I have 3 questions:

  1. Is there a way to debug my instance when it reach 100% disk ? I am using ElasticCloud
  2. Is there a configuration to make the instance read-only when it reach 95% ?
  3. I am missing something with data-stream and ILM ? Why the ILM doesn't seem to work ?

Thank you

Hi @MyNameHasDiactrics Welcome to the community and thank you for trying elastic cloud.

A couple things

Yes so you need to update that Policy to what you want.

What do you want... you want to move from hot to cold after 1 day .. 2 Days or By Size....

How Much data are you ingesting per day? How Long do you want to keep it?

That is a very small instance how much per day / hour are you ingesting?

Lets us know and we can help with that...

Ok so I find out what was my issue. I misunderstood how ILM phase changes are triggered. Basically when the policy is set to delete the data after x days, it actually means (or at least that is what I understood): Delete the index x days after it was rolled over. By default, ILM have a policy where they rollover the index after 50Go or 30 days, which was way to high for my usage.
I changed this and now it is working.

1 Like