I have a Logstash container that is configured to read objects from S3.
The requirement is to filter old objects, let's say objects before 3 months should be dropped.
I noticed that I can expose the s3 metadata, so I have the following metadata in each event:
There is a great newer filter called "age" which may work for your use case.
Just be sure to update your @timestamp with the last_modified field using the date filter first and then run the age filter with your conditional drop statement.
date {
match => [ "[@metadata][s3][last_modified]", "ISO8601" ]
target => "@timestamp"
}
age {}
#One month = 2629746
#Three months = 7889238
if [@metadata][age] > 7889238 {
drop {}
}
Thanks a lot! @AquaX I was familiar with the "age" plugin, but actually, I didn't know if this is a good approach to override the original timestamp value. First, I will make a try and check your suggestion and also verify if I have some limitations to override the timestamp value.
Overriding the @timestamp field is a common practice and will allow you to actually properly see the data in a timeline from when it was originally generated.
If you want you can make a duplicate of the @timestamp field first and save it in another field so you can capture the "processed" time then that's another thing you could do (I do this in my environment).
I agree I sound very reasonable. In the first place, I thought that I might change some existing logic that uses the timestamp... but I think that this is fine.
So, I had to install the plugin as it was not installed in the image that I'm using.
I started the approach of first parsing the field with the date filter to aother field, in order to see that the parsing are fine, but it seems like I have a problem.
First, should I use the exact path of the last_modified field? something like
date {
match => [ "[@metadata][s3][last_modified]", "ISO8601" ]
target => "s3Time"
}
No?
Second, it seems like this is failed to parse, I see in the logs:
I see it! Looking at your data again it looks like last_modified is already being recognized as a timestamp type of field as there are no " " around the value. That's good! You can just rename or copy the field then using a mutate filter. No need for the date filter at all
Ok, so the direction that you gave me @AquaX was great. After some playing with the logstach and debugging I manage to configure the age plugin using the S3 last_modified -
In the mutate, in order to override the @timestamp I used copy as follows:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.