Grok with multiple matches - can i assign type based on match?


(James Manning) #1

I'm parsing an IRC log and trying to assign different types based on which type of log entry a given line is.

Currently I'm using N different matches in a single grok filter, since a given line should match at most one of them, and then doing conditional mutates after. This feels like I'm doing something wrong.

I could break them up into N different grok calls each setting the type, but then I end up getting valid lines tagged as failures since each line will fail to match at least one of the expressions. I could override those to not set the failure tag, but it feels like I'm missing something going down this path.

I've tried adding add_field after each match, but it then adds all the fields instead of just the one from the previous match.

Version:

iMac:logstash james$ logstash --version
logstash 2.4.0

Test input:

iMac:logstash james$ cat test.log 
--- Log opened Tue Sep 06 00:00:04 2016
--- Day changed Tue Sep 06 2016
00:00:04+0200 <+User1> Some message here
00:13:52+0200 -!- User2 [User2@some.site] has joined #somechannel
00:05:33+0200 -!- User3 [User3@some.other.site] has left #somechannel []
08:46:06+0200  * User4 does some action

Test configuration:

iMac:logstash james$ cat test.conf 
input {
  stdin { }
}

filter {
  grok {
    # normal chat message entry
    match => { "message" => "%{HOUR}:%{MINUTE}:%{SECOND}%{ISO8601_TIMEZONE}%{SPACE}<[@ +*]%{SPACE}(?<user>[^>]+)>%{SPACE}(?<messageText>.*)" }

    # user did /me action
    match => { "message" => "%{HOUR}:%{MINUTE}:%{SECOND}%{ISO8601_TIMEZONE}%{SPACE}\* (?<user>[^ ]+)%{SPACE}(?<action>.*)" }

    # user joined entry
    match => { "message" => "%{HOUR}:%{MINUTE}:%{SECOND}%{ISO8601_TIMEZONE}%{SPACE}-!-%{SPACE}(?<user>[^ ]+) \[.*\] has joined .*" }

    # user left entry
    match => { "message" => "%{HOUR}:%{MINUTE}:%{SECOND}%{ISO8601_TIMEZONE}%{SPACE}-!-%{SPACE}(?<user>[^ ]+) \[.*\] has left .*" }
  }

  if [messageText] =~ "." {
    mutate { replace => { "type" => "chatMessage" } }
  } else if [action] =~ "." {
    mutate { replace => { "type" => "action" } }
  } else if [message] =~ "has joined" {
    mutate { replace => { "type" => "joined" } }
  } else if [message] =~ "has left" {
    mutate { replace => { "type" => "left" } }
  }

  if "_grokparsefailure" in [tags] {
    drop {}
  }
}

output {
  stdout { codec => rubydebug }
}

Output from running shows it works fine, but that set of conditional mutate calls feels like I'm missing the Right Way(tm) to be doing this. :smile:

iMac:logstash james$ logstash -f test.conf < test.log 
Settings: Default pipeline workers: 4
Pipeline main started
{
        "message" => "00:00:04+0200 <+User1> Some message here",
       "@version" => "1",
     "@timestamp" => "2016-09-13T15:06:31.485Z",
           "host" => "iMac.local",
           "user" => "User1",
    "messageText" => "Some message here",
           "type" => "chatMessage"
}
{
       "message" => "00:13:52+0200 -!- User2 [User2@some.site] has joined #somechannel",
      "@version" => "1",
    "@timestamp" => "2016-09-13T15:06:31.485Z",
          "host" => "iMac.local",
          "user" => "User2",
          "type" => "joined"
}
{
       "message" => "00:05:33+0200 -!- User3 [User3@some.other.site] has left #somechannel []",
      "@version" => "1",
    "@timestamp" => "2016-09-13T15:06:31.485Z",
          "host" => "iMac.local",
          "user" => "User3",
          "type" => "left"
}
{
       "message" => "08:46:06+0200  * User4 does some action",
      "@version" => "1",
    "@timestamp" => "2016-09-13T15:06:31.485Z",
          "host" => "iMac.local",
          "user" => "User4",
        "action" => "does some action",
          "type" => "action"
}
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}
iMac:logstash james$

(Magnus Bäck) #2

Perhaps this would be more elegant:

grok {
  # normal chat message entry
  match => { "message" => "%{HOUR}:%{MINUTE}:%{SECOND}%{ISO8601_TIMEZONE}%{SPACE}<[@ +*]%{SPACE}(?<user>[^>]+)>%{SPACE}(?<messageText>.*)" }
  add_field => {
    "type" => "chatMessage"
  }
}

grok {
  # user did /me action
  match => { "message" => "%{HOUR}:%{MINUTE}:%{SECOND}%{ISO8601_TIMEZONE}%{SPACE}\* (?<user>[^ ]+)%{SPACE}(?<action>.*)" }
  add_field => {
    "type" => "action"
  }
}

You'll have to deal with non-matches in some clever way, though. Perhaps by deleting the type field before the grok filters, setting tag_on_failure => [] in all grok filters, and dropping the event unless type has been set?


(James Manning) #3

That's a great idea! Thanks, @magnusbaeck !! :+1:


(Lucianspec) #4

perhaps it's not a good suggestion...
wrote two grok block will made the match execute twice everytime, it's total a waste.
also break_on_match = true means grok filter break the match phase not the grok filter. so your example will get this

"type" => ["chatMessage", "action"] most of time is not your want.

grok also had no way to get the match block result. so the only way i found is

grok {
patterns_dir => ["${PROJECT}/patterns"]
match => { "message" => "^%{STORM_LOG:storm_log}$" }
match => { "message" => "^%{STORM_MESSAGE:storm_message}$" }
}

if [storm_log] {
  mutate { 
    remove_field => "storm_log"
    add_field => { "match" => "STORM_LOG" }
  }
} else if [storm_message] {
  mutate {
    remove_field => "storm_message"
    add_field => { "match" => "STORM_MESSAGE" }
  }
}

@magnusbaeck Right?


(Magnus Bäck) #5

so your example will get this

"type" => ["chatMessage", "action"] most of time is not your want.

add_field only fires when the grok filter is successful. Unless both expressions match all messages I don't see why you'd get an array.


(Lucianspec) #6

okay, i read the source code and it's true.
but i still want to point out it's hard to ensure string group each regex can recognize don't have duplicate elements.
we adjust the priority by change the regex order.

case match
when /integer/
puts "this is a integer"
when /int/
puts "this is a int"
when /i/
puts "this is a i"
end

most of time we need is implement a case statement, but seems grok's DSL made you have to choice:

  1. edit branch logic but lose the "break" keyword
  2. break_on_match but can't tell you which pattern is matched...

it's so strange...


(Young Bae Jeon) #7

Thankyou!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
real thank you!!!!!!!!!!!!!!!!!!!!!!