Custom grok pattern issue

I am testing custom grok patterns.
First with the following logstash config:

input {
    stdin{}
}

filter {
    grok {
    match => {"message" => "%{SYSLOGTIMESTAMP:time} %{GREEDYDATA:other}"}
    }

}
    
output {
    stdout {
        codec => rubydebug
    }
}

When I submit

May 8 06:47:27 aef46fa42c11[1036]: 29.22.234.151

The result is as expected

{
      "@version" => "1",
       "message" => "May  8 06:47:27 aef46fa42c11[1036]: 29.22.234.151",
          "host" => "scw-8ccfeb",
    "@timestamp" => 2018-05-08T04:52:55.115Z,
          "time" => "May  8 06:47:27",
         "other" => "aef46fa42c11[1036]: 29.22.234.151"
}

But when I try to include in the match the custom pattern, it doesn't work:

match => {"message" => "%{SYSLOGTIMESTAMP:time} (?<clientid>[a-z0-9]+\[[0-9]+\]) %{GREEDYDATA:other}"}

The same input gives grokparsefailure:

May 8 06:47:27 aef46fa42c11[1036]: 29.22.234.151

result:

{
          "host" => "scw-8ccfeb",
      "@version" => "1",
          "tags" => [
        [0] "_grokparsefailure"
    ],
       "message" => "May  8 06:47:27 aef46fa42c11[1036]: 29.22.234.151",
    "@timestamp" => 2018-05-08T04:58:50.598Z
}

I have tested the regular expression (http://rubular.com/ as well as https://regex101.com/) and it looks good:

I doubt, the way I implemented it in match expression.

Any hint is appreciated.

Always format Logstash configuration as preformatted text so it doesn't get mangled.

The problem is that you're not taking the colon after the program/pid into account.

1 Like

Thanks @magnusbaeck for your reply.

Indeed, if I include the colon ":" in the expression, I got no grok failure, but I get sourceid field with the colon:

May  8 04:31:02 aef46fa42c11[1036]: 170.66.228.69 - - 
{
          "time" => "May  8 04:31:02",
      "clientip" => "170.66.228.69",
       "message" => "May  8 04:31:02 aef46fa42c11[1036]: 170.66.228.69 - - ",
         "other" => "- - ",
      "@version" => "1",
    "@timestamp" => 2018-05-08T14:37:29.054Z,
      "sourceid" => "aef46fa42c11[1036]:",
          "host" => "scw-4bed5f"
}

I have tried to substitute ":" with "", but doesn't work.

match => {"message" => "%{SYSLOGTIMESTAMP:time} (?<sourceid>[a-z0-9]+\[[0-9]+\]\:) %{IP:clientip} %{GREEDYDATA:other}"}
}
mutate {
  gsub => [
    "sourceid", ':',""
  ]
}

Result is the same:

{
          "time" => "May  8 04:31:02",
      "clientip" => "170.66.228.69",
       "message" => "May  8 04:31:02 aef46fa42c11[1036]: 170.66.228.69 - - ",
         "other" => "- - ",
      "@version" => "1",
    "@timestamp" => 2018-05-08T14:37:29.054Z,
      "sourceid" => "aef46fa42c11[1036]:",
          "host" => "scw-4bed5f"
}

Can I indicate it in the grok match so it ignore it?

I tried adding the colon just after the custom patten, but it gives grokfailure:

 match => {"message" => "%{SYSLOGTIMESTAMP:time} (?<sourceid>[a-z0-9]+\[[0-9]+\]): %{IP:clientip} %{GREEDYDATA:other}"}
    }

Got it by escaping the colon outside of the pattern!

filter {
    grok {
    match => {"message" => "%{SYSLOGTIMESTAMP:time} (?<sourceid>[a-z0-9]+\[[0-9]+\])\: %{IP:clientip} %{GREEDYDATA:other}"}
    }
}

Correct result:

May  8 04:31:02 aef46fa42c11[1036]: 170.66.228.69 - -
{
          "time" => "May  8 04:31:02",
         "other" => "- -",
      "clientip" => "170.66.228.69",
          "host" => "scw-4bed5f",
      "@version" => "1",
    "@timestamp" => 2018-05-08T18:21:08.877Z,
       "message" => "May  8 04:31:02 aef46fa42c11[1036]: 170.66.228.69 - -",
      "sourceid" => "aef46fa42c11[1036]"
}

Thanks @magnusbaeck!

The colon obviously needs to go outside the parenthesis group to not get captured to the field, but there's no point in escaping the colon. Colons have no special meaning in regexps.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.