You mean logs that really were generated long ago? You need ask yourself where they were for all the intervening tine …
Usually, in my experience, this is a bad data issue. Eg the timestamp is generated on the client. Client can, eg, be offline for ages then re-appears and empties its buffers. Or client can simply have the wrong time.
There is nothing on it that would make Logstash use an older date.
You do not seem to be parsing the date from your logs, so the @timestamp field will have the value from the event was processed by logstash.
index => "%{service}-%{+YYYY.MM.dd}"
The +YYYY.MM.dd comes from the @timestamp field and from what you shared it will use the timestamp generated by logstash.
Also, from the error you shared the @timestamp field has the value of @timestamp"=>2025-01-24T00:15:25.899Z, so this would make logstash write into an index with 2025.01.24 in the name, this is expected.
I would assume that since Logstash is retrying this, it is retrying to index this data since January 24th.
If I'm not wrong, 429 has unlimited retries, so I suspect that your cluster reached the flood stage on January 24th and then stopped indexing new data and Logstash is retrying since this date to index the data.
I monitor the cluster there is no issue. flood stage is write operation happens only in hot node.
On 27th it was trying to write to 24th jan index which was moved to warm node.
Can I verify with sincedb? if the event is already sent to Elasticsearch or not?
Flood stage blocks write to all indices that have at least one shard in the node that reached flood stage.
Having a node in flood stage is an issue that can impact your entire cluster.
The log you shared means that the index that logstash trying to write was on a node that reached the flood stage.
In your logstash output you have this:
index => "%{service}-%{+YYYY.MM.dd}"
This tells logstash to get the value for the index from the field service and the date from the field @timestamp and as mentioned the document you shared the @timestamp field has the value of 2025-01-24T00:15:25.899Z and there is no information about the field service.
This is expected, if the @timestamp field has a value of 2025-01-24T00:15:25.899Z and you are using a sprintf of the date like %{+YYYY.MM.dd}, then logstash will replace this with 2025.01.24.
But it is not clear if you are parsing a date from your document into @timestamp or not, you didn't share your full configuration.
Without seeing your full configuration it is not possible to have a hint of may be the reason.
sinceb only tracks the position of the file that logstash has read, it has no information if the event was sent to Elasticsearch or not.
The original Q here has been answered, sincedb can’t be used to explain the specific mystery
But if you have a backup or snapshot of the log file taken between the 2 dates, you can look at the actual log file on say 25th or 26th, see if the log entry that was processed on 27th was there.
I asked above where do you think the data was between the 2 dates. If it was, as I suspect, sitting in the log file, then what you can do is watch the system a bit more closely around these flood stage warnings and understand better any wider impacts. Which might be wider than you currently understand.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.