Parsing message using Grok filter

Hi,

I am trying to drop a couple of words in a message.

/test/data/user/log1/xyz

How can I drop the first two words "test" and "data" only when the message starts with "/test" and store username=user and file=log1/xyz?

Is it possible to write if else loop in grok filter? If message="/test/*" use pattern A else use pattern B ?

I am trying to parse them using Grok filter but cannot come across the right REGEX for it

Yes, ruby regexp supports positive and negative lookahead and lookbehind assertions, as well as alternation. This allows you to write complex if-elsif-elsif-else tests into your patterns. However, anyone who has to modify such a grok will probably re-write it from scratch.

You can do the test in the filter section using a conditional. For example

if [someField] =~ /^\/test\// {
      grok { match => [...]
} else {
      grok { match => [...]
}

If you are able to order your patterns such that only one applies then you can have grok try them all by supplying an array of patterns to the match option.

You could start with

grok { match => { "someField" => "^/test/data/(?<user>[^/]+)/%{GREEDYDATA:fileName}" } }

Thanks for the info.

I have a single message which is composed of multiple values joined by pipes.
20190615|4|method|userend|/test/123/1.1|500|2
Now my condition would be "if the 5th value in a message starts with "/test" then use grok filter1 and else use grok filter2

I am trying to get the exact regex for it. Would the below work?

^(.+?)\|(.+?)\|(.+?)\|(.+?)\|\/test\/.+?\|(.+?)\|.+$ . how can i use this in a condition to check if every message has /test as a 5th value ?

Thanks

grok is not the only tool in the toolbox.

    mutate { split => { "message" => "|" } }
    if [message][4] =~ /^\/test/ {
        [...]
    }

I've tried this pattern but I got grok parse failure

filter {
mutate { split => { "message" => "|" } }
    if [message][4] =~ /^\/test/ {

    grok {
            # Enable multiple matchers
            break_on_match => false

            match => { "message" => "%{DATA:timestamp_local}\|%{NUMBER:duration}\|%{WORD:requesttype}\|%{DATA:username}\|%{DATA:resource}\|%{NUMBER:statuscode}\|%{NUMBER:bytes}" }

            # Extract repo and path
            match => { "resource" => "/%{DATA:repo}/%{GREEDYDATA:resource_path}"}

            # Extract resource name
            match => { "resource_path" => "(?<resource_name>[^/]+$)" }
    }

}
}

Output:

{"@version":"1","@timestamp":"2019-06-27T14:00:48.450Z","path":"/Users/testing/ai.log","host":"SI-M-C6G5","message":["20190615","4","method","userend","/test/123/1.1","500","2"],"tags":["_grokparsefailure"]}

The split filter converts a string into an array. So after the split message looks like this:

   "message" => [
    [0] "20190615",
    [1] "4",
    [2] "method",
    [3] "userend",
    [4] "/test/123/1.1",
    [5] "500",
    [6] "2"

If you want to be able to grok the entire message field, then copy it to another field before splitting it

mutate { add_field => { "[@metadata][copyOfMessage]" => "%{[message]}" } }
mutate { split => { "[@metadata][copyOfMessage]" => "|" } }
if [@metadata][message][4] =~ /^\/test/ {

Thanks for the info. I apologize if I did not specify my requirement correctly. Just to reiterate

20190615|4|method|userend|/test/123/1.1|500|2 If the 5th value in message starts with "/test", I need to drop it and store "/123/1.1" in a field.

The split filter does help me to split and check whether I have "/test" as the 5th value but how would I drop "/test" and store the rest in a field? I though grok would be the only way to do it.

filter {
mutate { add_field => { "[@metadata][copyOfMessage]" => "%{[message]}" } }
mutate { split => { "[@metadata][copyOfMessage]" => "|" } }
if [@metadata][message][4] =~ /^\/test/ {
grok {

        match => { "message" => "%{DATA:timestamp_local}\|%{NUMBER:duration}\|%{WORD:requesttype}\|%{DATA:username}\|%{DATA:resource}\|%{NUMBER:statuscode}\|%{NUMBER:bytes}" }

            # Extract repo and path
            match => { "resource" => "/%{DATA:repo}/%{GREEDYDATA:resource_path}"}

            # Extract resource name
            match => { "resource_path" => "(?<resource_name>[^/]+$)" }
    }
}
}

I've tried to copy the message in a new field and then parse it
OUTPUT:

{
          "host" => "SI-M-C6G5",
      "@version" => "1",
    "@timestamp" => 2019-06-27T14:40:05.914Z,
       "message" => "20190615|4|method|userend|/test/123/1.1|500|2",
          "path" => "/Users/testing/ai.log"
}

It works if you split it into 3 groks

    grok { match => { "message" => "%{DATA:timestamp_local}\|%{NUMBER:duration}\|%{WORD:requesttype}\|%{DATA:username}\|%{DATA:resource}\|%{NUMBER:statuscode}\|%{NUMBER:bytes}" } }
    grok { match => { "resource" => "/%{DATA:repo}/%{GREEDYDATA:resource_path}"} } }
    grok { match => { "resource_path" => "(?<resource_name>[^/]+$)" } }
filter {
mutate { add_field => { "[@metadata][copyOfMessage]" => "%{[message]}" } }
mutate { split => { "[@metadata][copyOfMessage]" => "|" } }
if [@metadata][message][4] =~ /^\/test/ {
grok { match => { "message" => "%{DATA:timestamp_local}\|%{NUMBER:duration}\|%{WORD:requesttype}\|%{DATA:username}\|%{DATA:resource}\|%{NUMBER:statuscode}\|%{NUMBER:bytes}" } }
grok { match => { "resource" => "/(?<repo>[^\/]+)/%{GREEDYDATA:resource_path}"} }
grok { match => { "resource_path" => "(?<resource_name>[^/]+$)" } }
}
}

I've tried using multiple groks but still see the same output

{
          "host" => "SI-M-C6G5",
      "@version" => "1",
    "@timestamp" => 2019-06-27T15:36:34.446Z,
       "message" => "20190615|4|method|userend|/test/123/1.1|500|2",
          "path" => "/Users/testing/ai.log"
}

Replace [message] with [copyOfMessage]

Thanks a lot @Badger

Just a small question,

Why does splitting groks and mutate filters work and not if when all the commands are in a single filter ?

I am not sure. I do know that specifying the same option to a filter multiple times often works, but sometimes does not. It is very confusing, so I avoid doing it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.