Correct Grok Pattern for an optional field including spaces

In some of the log messages, an optional field [opt_field=xyz] or [opt_field=] is present. I arrived at the following pattern to handle it:

(?m)%{TIMESTAMP_ISO8601:Src_CreateDate} %{LOGLEVEL:Src_Severity} \[(?<Src_Thread>[^\]]+)\] (\s*\[opt_field=%{DATA:opt_field}\]\s+)?%{JAVACLASS:Src_ClassName} - \{%{GREEDYDATA:Src_LogMsg}\}

While the pattern works (tested in grokconstructor), I wanted inputs on whether anything needs to be changed in it?

Here are the 4 different sample log messages with the conditions that need to be handled:

2019-07-01 11:20:35,539 INFO [Consumer-1] com.foo.webservices.es.handler.Logger - {msg={\"processTime\":1561980034949,\"fieldA\":\"AAAAAAAAAAAA\",\"fieldB\":\"-123456\",\"fieldC\":\"Value_C\",\"fieldD\":false}, errormsg=no record found}
2019-07-01 11:20:36,942 INFO [exec-31] [opt_field=] com.foo.webservices.es.handler.Logger - {abc=foobar, def=barfoo}
2019-07-01 11:20:35,664 INFO [Consumer-2] [opt_field=opt1_13d67663-615f-4689-9af1-3fa556c84067] com.foo.webservices.es.handler.Logger - {msg={\"processTime\":1561980034694,\"fieldA\":\"AAAAAAAAA\",\"fieldB\":\"-567890\"}, hid=host_id}
2019-07-01 11:20:35,664 INFO [Consumer-2][opt_field=opt1_13d67663-615f-4689-9af1-3fa556c84067]   com.foo.webservices.es.handler.Logger - {msg={\"processTime\":1561980034694,\"fieldA\":\"AAAAAAAAA\",\"fieldB\":\"-567890\"}, hid=host_id}

1st Msg: Optional field is not present. Pattern should match.
2nd Msg: Optional field present albeit with null value. Pattern should match.
3rd Msg: Optional field is present. Pattern should match.
4th Msg: Optional field is present but does not have a space after thread_name. Pattern should not match.

Can someone provide insights as to whether this pattern is correct and what could be done better, if anything.

ELK Stack Version: 5.5.1

Thank you.

With 7.2 I get the match / not match that you say you want. For [opt_field=] it does match but no opt_field field is added to the event. That is as expected.

1 Like

With 5.6.9 (the closest version I have to yours locally), I get the results that you say you want.

  • In general, we recommend that grok patterns are anchored (e.g., they begin with a start-of-line anchor ^ or beginning-of-input anchor \A), which allows the pattern to give up faster when it doesn't find a match starting at the beginning; without anchors, a failed match will be attempted again starting with the 2nd character in the line, and again from the third character, etc.
  • the pattern may be more performant if you used a more restricted pattern than DATA, as the pattern it expands to typically ends up capturing too much, requiring the parser to backtrack; for example, I assume that you would expect the value to not contain a closing square-bracket, so you could define one or more pattern definitions as so:
      pattern_definitions => {
        "NOT_CLOSE_BRACKET" => "[^\]]*"
      }
    
    OR, the pattern could be defined in pure regex, as you already do for the Src_Thread field:
    (?m)%{TIMESTAMP_ISO8601:Src_CreateDate} %{LOGLEVEL:Src_Severity} \[(?<Src_Thread>[^\]]+)\] (\[opt_field=(?<opt_field>[^\]]*)\] )?%{JAVACLASS:Src_ClassName} - \{%{GREEDYDATA:Src_LogMsg}\}
    
1 Like

Excellent suggestion Ry. Implemented it.

You mean to say - Like this?

 ^(?m)%{TIMESTAMP_ISO8601:Src_CreateDate} %{LOGLEVEL:Src_Severity} \[(?<Src_Thread>[^\]]+)\] (\[opt_field=(?<opt_field>[^\]]*)\] )?%{JAVACLASS:Src_ClassName} - \{%{GREEDYDATA:Src_LogMsg}\}

Also, how do you colour the code :slight_smile: For e.g your few code syntax appears in red colour. Mind sharing?

If you use </> or indent text using four spaces it will appear like this

pattern_definitions => {
    "NOT_CLOSE_BRACKET" => "[^\]]*"
}

If you use three backticks on a line before and after it will appear like this

  pattern_definitions => {
    "NOT_CLOSE_BRACKET" => "[^\]]*"
  }
1 Like
{ "Awesome" => "ThankYou @Badger " }

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.