Grok best practice

kharvey · March 18, 2019, 9:56pm

I'm getting farther and farther in the weeds with building out custom grok stuff. And before I make it too far I would like to know what the best practice for handling grok patterns is.

Is it best to make one giant grok?

grok {
  match => { "message" => "%{GTINTLT:weirdnum}\s%{EPOCH:epoch}\s%{USER:device}\s%{LOGTYPE:logtype}\s%{ACTION:action}\s%{SRCIP}%{IP:scrip}\s%{DSTIP}%{IP:dstip}\s%{MACADDR}%{MAC:    macaddr}\s%{PROTO}%{WORD:protocol}\s%{SPORT}%{INT:sport}\s%{DPORT}%{INT:dport}%{GREEDYDATA:message}" }
}

Is it better to break them out into grok chunks? (hehe grok chunks)

grok {
 match => { "message" => "%{GTINTLT:weirdnum}\s%{GREEDYDATA:message}" } 
 overwrite => ["message"]             
}
grok {
 match => { "message" => "%{EPOCH:epoch}\s%{USER:device}\s%{GREEDYDATA:message}" }   
 overwrite => ["message"]
}
grok {
 match => { "message" => "%{LOGTYPE:logtype}\s%{ACTION:action}\s%{GREEDYDATA:message}" }
 overwrite => ["message"]
}
grok {
 match => { "message" => "%{SRCIP}%{IP:scrip}\s%{DSTIP}%{IP:dstip}\s%{GREEDYDATA:message}" }  
 overwrite => ["message"]
}
grok {
 match => { "message" => "%{MACADDR}%{MAC:macaddr}\s%{PROTO}%{WORD:protocol}\s%{GREEDYDATA:message}" }  
 overwrite => ["message"]
}
grok {
 match => { "message" => "%{SPORT}%{INT:sport}\s%{DPORT}%{INT:dport}%{GREEDYDATA:message}" }  
 overwrite => ["message"]
}

Or maybe to use multiple matches in a single grok?

grok {
 match => { "message" => "%{GTINTLT:weirdnum}\s%{GREEDYDATA:message}" }
 overwrite => ["message"]
 match => { "message" => "%{EPOCH:epoch}\s%{USER:device}\s%{GREEDYDATA:message}" }   
 overwrite => ["message"]
 match => { "message" => "%{LOGTYPE:logtype}\s%{ACTION:action}\s%{GREEDYDATA:message}" }
 overwrite => ["message"]
 match => { "message" => "%{SRCIP}%{IP:scrip}\s%{DSTIP}%{IP:dstip}\s%{GREEDYDATA:message}" }  
 overwrite => ["message"]
 match => { "message" => "%{MACADDR}%{MAC:macaddr}\s%{PROTO}%{WORD:protocol}\s%{GREEDYDATA:message}" }                                                                         
 overwrite => ["message"]
 match => { "message" => "%{SPORT}%{INT:sport}\s%{DPORT}%{INT:dport}%{GREEDYDATA:message}" }  
 overwrite => ["message"]
}

Please note, I have not tested this one. So I have no clue if this one would work.

I'm old school, so I try to avoid going over 80 characters in length. But I also know that it may become needlessly complicated if I follow that.

So I am kind of looking for what the community actually does.

paz · March 19, 2019, 2:49pm

Sequential grok filters with field overwriting are needlessly complicated and will most likely hurt your performance (a lot).

That said, it depends on what you're after and your use case.
You need to increase throughput as much as you can?
You want to have a configuration that suits your way of coding and you can debug easily, and performance is not an issue?
Do all your logs share the same pattern, or are there varied logs in there?

Most often than not, you want to have the biggest possible match in a single line.

kharvey · March 19, 2019, 3:56pm

That is exactly what I was looking for. Thank you.

I think my plan will be to break out into several groks while I am testing and developing. And once I have it up and running the way that I want, I will combine them all together.

As for my logs I have logs for just about everything that I will be processing. I have massive logs, and small logs, and all kinds of different setups. As it stands now, I will probably have around 3 or 4 dozen pipelines to process all of the logs.

pup_seba · March 19, 2019, 8:33pm

Hi!

I'm in the same situation as you, trying to understand how to make "good" filters. I found this, which I tried to follow. https://www.elastic.co/blog/do-you-grok-grok

Just be aware that it may be a little bit outdated, so for instance, the "overwrite" will not work if you do it like they tell. As you already know, "overwrite" needs an array and in the example they show a string. But apart for those little details, the document seems to be about right.

With that being said, in your case, I would just use that "giant" grok. but I would add the anchors to it or at least the beggining one (as you are using a greedydata at the end maybe it makes no much sense to use a $ anchor there...).

Regards,

kharvey · March 20, 2019, 3:37pm

That was a great read through, thanks for posting that.
I have gone through and optimized my pipeline to increase the throughput now.

I now have added in some anchors. I have also learned a few other tricks other than grok. I have a suspicion that there is a still a ton of things that I need to learn about parsing logs.

system · April 17, 2019, 3:38pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Grok performance Logstash	5	1334	January 18, 2018
Best way to right grok filters Logstash	2	283	June 15, 2019
Performance implication of multiple grok patterns for a single log file Logstash	5	1199	November 6, 2018
Multiple grok and multiple match option in one grok Logstash	2	4396	July 6, 2017
Grokking multiple line formats Logstash	4	845	December 28, 2017

Grok best practice

Related topics