Grok S3 object "path"

eclipsed450 · July 14, 2019, 5:31pm

Hi all,

I'm trying to figure out a way to grok out the key of an S3 object. Here is an example key:

elasticmapreduce/j-fjtnfnfk56/containers/application_1111111111111_0169/container_1111111111111_0169_01_000029/stderr.gz

I used https://grokdebug.herokuapp.com/ and came up with the following:

%{WORD:folder}/(?<cluster>[^/]*)/(?<subfolder_name>[^/]*)/(?<application_id>[^/]*)/(?<container_id>[^/]*)/(?<file_name>[^$]*)

But when I put it in my .conf file, as follows, it does't parse out the key into the fields:

filter { grok { add_field => [ "file", "%{[@metadata][s3][key]}" ] match => { "file" => "%{WORD:folder}\/(?<cluster_id>[^/]*)\/(?<subfolder_name>[^/]*)\/(?<application_id>[^/]*)\/(?<container_id>[^/]*)\/(?<file_name>[^$]*)" } } }

Even though the grok appears to work via https://grokdebug.herokuapp.com/:

{ "folder": [ [ "elasticmapreduce" ] ], "cluster": [ [ "j-fjtnfnfk56" ] ], "subfolder_name": [ [ "containers" ] ], "application_id": [ [ "application_1111111111111_0169" ] ], "container_id": [ [ "container_1111111111111_0169_01_000029" ] ], "file_name": [ [ "stderr.gz\n" ] ] }

Also, how to get rid of the \n at the end of the file name?

Any help would be greatly appreciated.

Badger · July 14, 2019, 8:24pm

add_field will only get executed when the grok succesfully completes, so the file field will not exist when it tries to match it. Matching a non-existent field is a no-op but counts as a successful completion.

Use a literal newline in mutate to remove a newline.

mutate { gsub => [ "filename", "
", "" ] }

eclipsed450 · July 14, 2019, 9:01pm

Thank you for that suggestion, however it only seems to have split the file field into a comma-separated line now. I tried adding the add_field option, but that didn't seem to do anything

filter { grok { match => { "message" => "%{DATESTAMP:message_timestamp} %{LOGLEVEL:severity} %{GREEDYDATA:msg}" } add_field => [ "file", "%{[@metadata][s3][key]}" ] } mutate { copy => { "file" => "file_tmp" } split => [ "file_tmp" , "/" ] add_field => { "folder" => "%{file_tmp[0]}" "cluster" => "%{file_tmp[1]}" "subfolder_name1" => "%{file_tmp[2]}" "containers" => "%{file_tmp[3]}" "application_id" => "%{file_tmp[4]}" "container_id" => "%{file_tmp[5]}" "filename" => "%{file_tmp[6]}" } } }

Also, using the code above, I got A BUNCH of these:
[2019-07-14T20:54:07,936][WARN ][logstash.filters.mutate ] Exception caught while applying mutate filter {:exception=>"Invalid FieldReference:file[0]"} [2019-07-14T20:54:07,936][WARN ][logstash.filters.mutate ] Exception caught while applying mutate filter {:exception=>"Invalid FieldReference: file[0]"} [2019-07-14T20:54:07,936][WARN ][logstash.filters.mutate ] Exception caught while applying mutate filter {:exception=>"Invalid FieldReference: file[0]"} [2019-07-14T20:54:07,936][WARN ][logstash.filters.mutate ] Exception caught while applying mutate filter {:exception=>"Invalid FieldReference: file[0]"} [2019-07-14T20:54:07,936][WARN ][logstash.filters.mutate ] Exception caught while applying mutate filter {:exception=>"Invalid FieldReference: file[0]"} [2019-07-14T20:54:07,936][WARN ][logstash.filters.mutate ] Exception caught while applying mutate filter {:exception=>"Invalid FieldReference: file[0]"} [2019-07-14T20:54:07,936][WARN ][logstash.filters.mutate ] Exception caught while applying mutate filter {:exception=>"Invalid FieldReference: file[0]"} [2019-07-14T20:54:07,937][WARN ][logstash.filters.mutate ] Exception caught while applying mutate filter {:exception=>"Invalid FieldReference: file[0]"} [2019-07-14T20:54:07,937][WARN ][logstash.filters.mutate ] Exception caught while applying mutate filter {:exception=>"Invalid FieldReference: file[0]"} [2019-07-14T20:54:07,937][WARN ][logstash.filters.mutate ] Exception caught while applying mutate filter {:exception=>"Invalid FieldReference: file[0]"}

Also, how do you do a multi-line code block?

Badger · July 14, 2019, 10:03pm

That should be [file][0] etc.

eclipsed450 · July 14, 2019, 10:23pm

Thank you, the fields show up now, but the values of the fields are that value now :-/ (getting closer)

It's worth asking, I saw the dissect filter. Would that be a better solution in this case instead?

Badger · July 14, 2019, 11:22pm

The dissect filter is often a better fit than grok for predictably delimited events.

input { generator { count => 1 lines => [ '' ] } }
filter {
    mutate { add_field => { "path" => "elasticmapreduce/j-fjtnfnfk56/containers/application_1111111111111_0169/container_1111111111111_0169_01_000029/stderr.gz" } }
    dissect { mapping => { "path" => "%{folder}/%{cluster}/%{subfolder_name}/%{application_id}/%{container_id}/%{file_name}" } }
}
output { stdout { codec => rubydebug { metadata => true } } }

will generate an event with these fields

"subfolder_name" => "containers",
       "cluster" => "j-fjtnfnfk56",
          "path" => "elasticmapreduce/j-fjtnfnfk56/containers/application_1111111111111_0169/container_1111111111111_0169_01_000029/stderr.gz",
        "folder" => "elasticmapreduce",
     "file_name" => "stderr.gz",
"application_id" => "application_1111111111111_0169",
  "container_id" => "container_1111111111111_0169_01_000029"

However, it does not handle optional fields, or any unpredictability except padding on separators. That's why it is so fast.

eclipsed450 · July 17, 2019, 3:08am

Thank you for that breakdown. For now, I'm focusing on this absolute parent path, but I would like to modify it to read in all of the other parent directories, and unknown number of sub-directories. Is this possible with either dissect, or split?

Badger · July 17, 2019, 1:53pm

If you have a variable number of / in the path then if there is a constant number of / that constitute a prefix you could dissect that, then use split on whatever is left. For example,

    dissect { mapping => { "message" => "/%{field1}/%{field2}/%{field3}/%{restOfLine}" } }
    mutate { split => { "restOfLine" => "/" } }

would handle both "/a/b/c/1/2" and "/d/e/f/7/8/9".

eclipsed450 · July 18, 2019, 2:29pm

I'll give that a shot, thanks!

system · August 15, 2019, 2:29pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash S3 input + Looking Path object Logstash	3	1268	November 4, 2022
Trim the file name remove last 2 characters and extension using grok Logstash	10	1998	July 8, 2019
Get Foldername from file sent by filebeat Logstash	2	844	July 6, 2017
Help on creating GROK patterns Logstash	16	2362	September 7, 2017
How to dynamic parse log via grok regex? Logstash	7	4695	July 6, 2017

Grok S3 object "path"

Related topics