Make a grok pattern for a field that might be missing

Hi! There are logs in the following format:
2023-03-05 17:07:01.586+0000 [L: WARN] [O: A.b.c.d.e.FGScript] [I: ] [U: email@example.com] [S: ] [P: ] [T: ABCProcessor-23 ] @@@ aboba=5 beboba=1 ceboba=4

So I have a correctly working grok pattern:

%{TIMESTAMP_ISO8601:timestamp} \[%{GREEDYDATA}: %{LOGLEVEL:logLevel}\] \[%{GREEDYDATA}: (?<O>([A-Za-z][.])+([A-Za-z0-9])+)\] \[%{GREEDYDATA}: (?<I>([A-Za-z])?)\] \[%{GREEDYDATA}: (?<U>([a-zA-Z0-9._?@-])+)\] \[%{GREEDYDATA}: (?<S>([A-Za-z])?)\] \[%{GREEDYDATA}: (?<P>([A-Za-z])?)\] \[%{GREEDYDATA}: (?<T>([A-Za-z0-9_-])+)\] %{GREEDYDATA:logMessage}

But the field "[P: ]" might not be present in logs, so the parsing fails with _grokparsefailure tag.

I've tried the following for that field to handle this situation:

(?:(\[%{GREEDYDATA}: (?<P>([A-Za-z])?)\])?
(?:\[%{GREEDYDATA}: (?<P>([A-Za-z])?)\])?
(\[%{GREEDYDATA}: (?<P>([A-Za-z])?)\]){0,1}
(\[%{GREEDYDATA}: ){0,1}((?<P>([A-Za-z])?)\]){0,1}

...and many-many-many other almost similar expressions, but It only works, when [P: ] is present
image

What I'm doing wrong? :frowning:
Thanks in advance

input {
  generator {
       message => "2023-03-05 17:07:01.586+0000 [L: WARN] [O: A.b.c.d.e.FGScript] [I: ] [U: email@example.com] [S: ] [P: ] [T: ABCProcessor-23 ] @@@ aboba=5 beboba=1 ceboba=4"
       count => 1
  }
 
} # input

filter {
    grok {
      match => {
        "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{WORD}: %{LOGLEVEL:logLevel}\] \[%{WORD}: %{DATA:o}\] \[%{WORD}: %{DATA:i}?\] \[%{WORD}: %{EMAILADDRESS:u}\] \[%{WORD}: %{DATA:s}?\] \[%{WORD}: %{DATA:p}\] \[%{WORD}: %{DATA:t}\] %{DATA:something} aboba=%{INT:aboba} beboba=%{INT:beboba} ceboba=%{INT:ceboba}"
        # your version "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{WORD}: %{LOGLEVEL:logLevel}\] \[%{WORD}: %{DATA:o}\] \[%{WORD}: %{DATA:i}?\] \[%{WORD}: %{EMAILADDRESS:u}\] \[%{WORD}: %{DATA:s}?\] \[%{WORD}: %{DATA:p}\] \[%{WORD}: %{DATA:t}\] %{GREEDYDATA:logMessage}"
      }
    }
    # another way
	# grok {
       # break_on_match => false
       # match => {
           # message => [
               # "%{TIMESTAMP_ISO8601:timestamp}",
               # "\[L: %{LOGLEVEL:logLevel}\]",
               # "\[O: %{DATA:o}\]",
               # "\[I: %{DATA:i}\]",
               # "\[U: %{EMAILADDRESS:u}\]",
               # "\[S: %{DATA:s}\]",
               # "\[P:%{SPACE}(%{DATA:p})?\]",
               # "\[T: %{DATA:t}\]",
               # "aboba=%{INT:aboba}",
               # "beboba=%{INT:beboba}",
               # "ceboba=%{INT:ceboba}"
           # ]
       # }
   # }
    # fastest way,will always the field name, no matter is null or not 
    # dissect {
      # mapping => {
        # "message" => "%{timestamp} [%{?level}: %{&level}] [%{?o}: %{&o}] [%{?i}: %{&i}] [%{?u}: %{&u}] [%{?s}: %{&s}] [%{?p}: %{&p}] [%{?t}: %{&t}] %{msg} %{?abob}=%{&abob} %{?bebob}=%{&bebob} %{?cebob}=%{&cebob}"
      # }
    # }
     # set timestamp as a value from the log
      date {
        match => ["timestamp", "ISO8601"]
        target=>"@timestamp" 
        remove_field => [ "timestamp" ]
      }
	  
	  # remove right space on the T field
	  mutate {
	  strip => ["t", "T"]
	  }

    prune { blacklist_names => [ "message", "@version", "location", "host", "event" ] }

}

output {
    stdout {
        codec => rubydebug{ metadata => true}
    }

}

Result:

{
        "ceboba" => "4",
    "@timestamp" => 2023-03-05T17:07:01.586Z,
             "t" => "ABCProcessor-23",
      "logLevel" => "WARN",
             "u" => "email@example.com",
        "beboba" => "1",
         "aboba" => "5",
     "something" => "@@@",
             "o" => "A.b.c.d.e.FGScript"
}

Thanks for your reply, but I have a bit another problem. I have to create a universal grok pattern for both situation:

  1. When field "[P: ]" (with all square braces) exists in log message.
  2. When field "[P: ]" (with all square braces) is absent.

The use this, 2nd option:

	grok {
       break_on_match => false
       match => {
           message => [
               "%{TIMESTAMP_ISO8601:timestamp}",
               "\[L: %{LOGLEVEL:logLevel}\]",
               "\[O: %{DATA:o}\]",
               "\[I: %{DATA:i}\]",
               "\[U: %{EMAILADDRESS:u}\]",
               "\[S: %{DATA:s}\]",
               "\[P:%{SPACE}(%{DATA:p})?\]",
               "\[T: %{DATA:t}\]",
               "aboba=%{INT:aboba}",
               "beboba=%{INT:beboba}",
               "ceboba=%{INT:ceboba}"
           ]
       }
    }

Note: The dissect plugin is not for option fields.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.