Logstash S3 input plugin - filter based on time modified

I have a Logstash container that is configured to read objects from S3.
The requirement is to filter old objects, let's say objects before 3 months should be dropped.

I noticed that I can expose the s3 metadata, so I have the following metadata in each event:

"@metadata" => {
    "s3" => {
                          "etag" => "\"xxx"",
                "content_length" => 33,
                      "metadata" => {},
                    "version_id" => "null",
                 "accept_ranges" => "bytes",
                 "last_modified" => 2021-12-21T13:30:28.000Z,

Maybe there is a filter/ruby code that I can use in order to filter "old" objects and drop them?

Any help is appreciated!

There is a great newer filter called "age" which may work for your use case.

Just be sure to update your @timestamp with the last_modified field using the date filter first and then run the age filter with your conditional drop statement.

date {
   match => [ "[@metadata][s3][last_modified]", "ISO8601" ]
   target => "@timestamp"

age {}
#One month  = 2629746
#Three months = 7889238
if [@metadata][age] > 7889238 {
    drop {}

Let me know if this works for you.

Thanks a lot! @AquaX I was familiar with the "age" plugin, but actually, I didn't know if this is a good approach to override the original timestamp value. First, I will make a try and check your suggestion and also verify if I have some limitations to override the timestamp value.

I will update...

Overriding the @timestamp field is a common practice and will allow you to actually properly see the data in a timeline from when it was originally generated.
If you want you can make a duplicate of the @timestamp field first and save it in another field so you can capture the "processed" time then that's another thing you could do (I do this in my environment).

Let me know if this solves your issue.

I agree I sound very reasonable. In the first place, I thought that I might change some existing logic that uses the timestamp... but I think that this is fine.

So, I had to install the plugin as it was not installed in the image that I'm using.
I started the approach of first parsing the field with the date filter to aother field, in order to see that the parsing are fine, but it seems like I have a problem.

First, should I use the exact path of the last_modified field? something like

date {
   match => [ "[@metadata][s3][last_modified]", "ISO8601" ]
   target => "s3Time"


Second, it seems like this is failed to parse, I see in the logs:

"tags" => [
[0] "_dateparsefailure"

I see it! Looking at your data again it looks like last_modified is already being recognized as a timestamp type of field as there are no " " around the value. That's good! You can just rename or copy the field then using a mutate filter. No need for the date filter at all :smiley:

mutate {
   copy => { "[@metadata][s3][last_modified]" => "s3time"}
   copy => { "@timestamp" => "processed_time"}
   copy =>  {"[@metadata][s3][last_modified]" => "@timestamp"}

Not sure if this code will work exactly as I don't have a logstash instance in front of me... play around with it.

Great, sure! I will play with it...
Thank you

Great! Please let me know if this solves your issue or mark this post as such :slight_smile:

Ok, so the direction that you gave me @AquaX was great. After some playing with the logstach and debugging I manage to configure the age plugin using the S3 last_modified -

In the mutate, in order to override the @timestamp I used copy as follows:

copy => { "[@metadata][s3][last_modified]" => "@timestamp"}

Then, it seems like the age plugin make the correct filtering based on the S3 object time

@AquaX feel free to edit you last comment and I will mark as you solved it :slight_smile:

Perfect! Glad you got it figured out.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.