Grok performance


(ssh) #1

hi there!

i use multiple grok match patterns in my logstash filter.

eg:

filter {
grok {
# grok1
break_on_match => true
match => { "message" => [ "^(?<log_timestamp>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{1,3}).?(?\d+)\s{0,}%{WORD:log_level}.?\s:\sV21.OneTwoThreeHelper\s:\sPageRequest\s{0,}::\s{0,}(?<session_id>\w+)\s{0,}.?.?(?[0-9a-zA-Z]{0,})(?.?)"]}
add_field => { "project_type" => "PROJ1" }
add_field => { "transaction_type" => "REQUEST" }
}
grok {
# grok2
break_on_match => true
match => { "message" => [ "(?<log_timestamp>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{1,3}).
(?\d+).?%{WORD:log_level}.?\s:\sPageResponse\s{0,}::\s{0,}(?<session_id>\w+)\s{0,}::\s{0,}.?.?(?[0-9a-zA-Z]{0,})(?.?)"]}
add_field => { "project_type" => "PROJ2" }
add_field => { "transaction_type" => "RESPONSE" }
}
grok {
# grok3
break_on_match => true
match => { "message" => [ "(?<log_timestamp>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{1,3}).
(?\d+).?%{WORD:log_level}.?\s:\sPage3\s{0,}::\s{0,}(?<session_id>\w+)\s{0,}::\s{0,}.?.?(?[0-9a-zA-Z]{0,})(?.*?)"]}
add_field => { "project_type" => "PROJ3" }
add_field => { "transaction_type" => "RESPONSE" }
}
grok {....}
grok {....}
grok {grok40}
}

  1. by using break_on_match => true, if log event match grok2, then exit grok2 or exit filter?
  2. i use 40 groks line, i’m not sure about performance impact on using multiple grokking. may i know how to improve performance and how to prepare / manage the logstash performance?

if someone give advice those, appreciate :slight_smile:


(Christian Dahlqvist) #2

Grok allows you to define multiple panterns in the same block, and this is where the 'break_on_match' parameter is useful in order to stop processing once a match is found. If you have multiple grok blocks, they will all be evaluated as the parameter does not span across blocks.


(ssh) #4

hi @Christian_Dahlqvist

thank you

i thought if grok match, exit grok block ignoring next line which is add_field.
that's why i wrote multiple block :smiley:

right now, i change one grok block and turn on break_on_match
then... i found that

project_type: PROJ1,PROJ2,PROJ3,.....PROJ40
transaction: REQUEST,RESPONSE,TEST,etc.

break_on_match only ignore next patterns but it still on going add_field.
what should i change about that? plz, kindly check my script again.


(Neal Caffery) #5

(Christian Dahlqvist) #6

It will add field if there is a successful match for one of the panterns in the block. Can you perhaps define the project at the source, e.g. in Filebeat, or using conditionals on the fields after the data has been extracted?


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.