Elasticsearch Indexes are being created for Older dates

Hi,

We are hosting the ELK Stack on a VM and using Index Lifecycle Policies to manage log retention for different environments. However, we are currently facing an issue.

We have configured a 4-day retention policy, and our indices are created in the format sm-pro-api-2026-03-30. Ideally, only the last 4 days of logs should be retained. However, due to a recent issue, older indices are being recreated and logs are continuously getting indexed into older dates again. This is causing storage and utilization concerns.

At the moment, we are unable to identify the root cause of this behavior. Could someone from the team please help us investigate and resolve this issue?

Hello and welcome,

You need to provide more context on how you are indexing your data.

Are you using Logstash? If yes, please share your configuration pipeline.

We are using Logstash. But i dont think the data is getting indexed through Logstash. It is being indexed using elastic server api endpoint. Anyway sharing the Logstash configuration pipeline.

Just changed the password for security reasons.

root@sm-elasticstack:/etc/logstash/conf.d# cat 02-beats-input.conf
input {
  beats {
    port => 5044
  }
}
root@sm-elasticstack:/etc/logstash/conf.d# cat 30-elasticsearch-output.conf
output {
  if [@metadata][pipeline] {
        elasticsearch {
        hosts => ["localhost:9200"]
        manage_template => false
        index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
        pipeline => "%{[@metadata][pipeline]}"
        user => "elastic"
        password => "dbededbewdiebdiudew"
        }
  } else {
        elasticsearch {
        hosts => ["localhost:9200"]
        manage_template => false
        index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
        user => "elastic"
        password => "dbweidbedejddwdjkwd"
        }
  }
}

If you are using Logstash, then the requests to index the data will be coming from Logstash.

Your issue is here:

index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"

The last part, %{+YYYY.MM.dd} will get the information from the @timestamp field in your events, so you are reading older events.

You didn't share youir filters, so it is not possible to know if you are parsing the @timestamp field from your events or using the ones that beats are sending.

But this is your main issue, you are using time-based indices and for some reason you are reading old data.

However, due to a recent issue

What was the recent issue?

Or “new” data that has “old” timestamps therein.

This can happen lots of ways, one is devices which were offline for a while, then appear online and send their stored logs for last N days/weeks/years. Or parsing errors, or just bugs, or log rotation errors, or … One can sometimes see indices from the future (well!) for similar reasons.

A lot of people add an ingest_timestamp field via an ingest pipeline, often just to measure lag but can be useful in other ways too.

Timestamp is being coming from the code side.

elasticsearchSinkOptions.EmitEventFailure = EmitEventFailureHandling.WriteToSelfLog | EmitEventFailureHandling.WriteToFailureSink | EmitEventFailureHandling.RaiseCallback;
elasticsearchSinkOptions.IndexFormat = "sm-" + environment?.ToLower().Replace(".", "-").Substring(0, 3) + "-" + text + "-" + DateTime.UtcNow.ToString("yyyy-MM-dd");

May i know what are these filters and where do i get it.

You didn't share youir filters, so it is not possible to know if you are parsing the @timestamp field from your events or using the ones that beats are sending.

But this is your main issue, you are using time-based indices and for some reason you are reading old data.

Which code is this? It is not clear, you need to provide context.

From what you share you are indexing your data using beats, Filebeat I'm assuming, and Logstash, if there is anything else in this ingestion flow, you need to share.

Logstash pipelines have inputs, filters and outputs, you just shared one input and one output, you need to share the full configuration.

Without context on how you are ingesting your data is pretty complicated to provide any feedback.

Please share your full logstash pipeline and provide context if you are indexing your data any other way.

From what you shared your issue seems to be related to indexing old data or new data with old dates.