Grok best practice

I'm getting farther and farther in the weeds with building out custom grok stuff. And before I make it too far I would like to know what the best practice for handling grok patterns is.

Is it best to make one giant grok?

grok {
  match => { "message" => "%{GTINTLT:weirdnum}\s%{EPOCH:epoch}\s%{USER:device}\s%{LOGTYPE:logtype}\s%{ACTION:action}\s%{SRCIP}%{IP:scrip}\s%{DSTIP}%{IP:dstip}\s%{MACADDR}%{MAC:    macaddr}\s%{PROTO}%{WORD:protocol}\s%{SPORT}%{INT:sport}\s%{DPORT}%{INT:dport}%{GREEDYDATA:message}" }
}

Is it better to break them out into grok chunks? (hehe grok chunks)

grok {
 match => { "message" => "%{GTINTLT:weirdnum}\s%{GREEDYDATA:message}" } 
 overwrite => ["message"]             
}
grok {
 match => { "message" => "%{EPOCH:epoch}\s%{USER:device}\s%{GREEDYDATA:message}" }   
 overwrite => ["message"]
}
grok {
 match => { "message" => "%{LOGTYPE:logtype}\s%{ACTION:action}\s%{GREEDYDATA:message}" }
 overwrite => ["message"]
}
grok {
 match => { "message" => "%{SRCIP}%{IP:scrip}\s%{DSTIP}%{IP:dstip}\s%{GREEDYDATA:message}" }  
 overwrite => ["message"]
}
grok {
 match => { "message" => "%{MACADDR}%{MAC:macaddr}\s%{PROTO}%{WORD:protocol}\s%{GREEDYDATA:message}" }  
 overwrite => ["message"]
}
grok {
 match => { "message" => "%{SPORT}%{INT:sport}\s%{DPORT}%{INT:dport}%{GREEDYDATA:message}" }  
 overwrite => ["message"]
}

Or maybe to use multiple matches in a single grok?

grok {
 match => { "message" => "%{GTINTLT:weirdnum}\s%{GREEDYDATA:message}" }
 overwrite => ["message"]
 match => { "message" => "%{EPOCH:epoch}\s%{USER:device}\s%{GREEDYDATA:message}" }   
 overwrite => ["message"]
 match => { "message" => "%{LOGTYPE:logtype}\s%{ACTION:action}\s%{GREEDYDATA:message}" }
 overwrite => ["message"]
 match => { "message" => "%{SRCIP}%{IP:scrip}\s%{DSTIP}%{IP:dstip}\s%{GREEDYDATA:message}" }  
 overwrite => ["message"]
 match => { "message" => "%{MACADDR}%{MAC:macaddr}\s%{PROTO}%{WORD:protocol}\s%{GREEDYDATA:message}" }                                                                         
 overwrite => ["message"]
 match => { "message" => "%{SPORT}%{INT:sport}\s%{DPORT}%{INT:dport}%{GREEDYDATA:message}" }  
 overwrite => ["message"]
}

Please note, I have not tested this one. So I have no clue if this one would work.

I'm old school, so I try to avoid going over 80 characters in length. But I also know that it may become needlessly complicated if I follow that.

So I am kind of looking for what the community actually does.

Sequential grok filters with field overwriting are needlessly complicated and will most likely hurt your performance (a lot).

That said, it depends on what you're after and your use case.
You need to increase throughput as much as you can?
You want to have a configuration that suits your way of coding and you can debug easily, and performance is not an issue?
Do all your logs share the same pattern, or are there varied logs in there?

Most often than not, you want to have the biggest possible match in a single line.

That is exactly what I was looking for. Thank you.

I think my plan will be to break out into several groks while I am testing and developing. And once I have it up and running the way that I want, I will combine them all together.

As for my logs I have logs for just about everything that I will be processing. I have massive logs, and small logs, and all kinds of different setups. As it stands now, I will probably have around 3 or 4 dozen pipelines to process all of the logs.

Hi!

I'm in the same situation as you, trying to understand how to make "good" filters. I found this, which I tried to follow. https://www.elastic.co/blog/do-you-grok-grok

Just be aware that it may be a little bit outdated, so for instance, the "overwrite" will not work if you do it like they tell. As you already know, "overwrite" needs an array and in the example they show a string. But apart for those little details, the document seems to be about right.

With that being said, in your case, I would just use that "giant" grok. but I would add the anchors to it or at least the beggining one (as you are using a greedydata at the end maybe it makes no much sense to use a $ anchor there...).

Regards,

That was a great read through, thanks for posting that.
I have gone through and optimized my pipeline to increase the throughput now.

I now have added in some anchors. I have also learned a few other tricks other than grok. I have a suspicion that there is a still a ton of things that I need to learn about parsing logs.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.