Howto eas{y|ier} debug grok parse failures

Trying to find a grok parse failure for hours now, though I seem to get my wanted fields and wondering how people go about debugging such failures other than the old interpreted language method of inserting print statements kind of thingy.

I've got few custom patterns in /etc/logstash/patterns/owm:

LOGTIME %{YEAR}%{MONTHNUM2}%{MONTHDAY} %{HOUR}%{MINUTE}(?>[0-9]){2}(?>[0-9]){3}%{ISO8601_TIMEZONE}
STATBASE %{LOGTIME:logtime} %{HOSTNAME:host} %{WORD} %{POSINT};%{WORD:stat}(%{POSINT}/%{POSINT})
STATRATE %{INT:rate}/sec %{INT:stim} ms [%{INT:pn}] %{INT:t1n1}/%{INT:t1n2}/%{INT:t1n3} %{INT:t2n1}/%{INT:t2n2}/%{INT:t2n3}
STATVALUE {INT:value}

got a filter like this:

filter {
# All OWM stat are prefixed with date+time host etc.
grok {
patterns_dir => "/etc/logstash/patterns"
overwrite => ["message"]
match => { "message" => "\A%{STATBASE} *%{GREEDYDATA:message}\Z" }
add_tag => ["owmstat"]
tag_on_failure => ["grok1"]
}
if "owmstat" in [tags] {
grok {
patterns_dir => "/etc/logstash/patterns"
overwrite => ["message"]
match => { "message" => "\A%{STATRATE}%{GREEDYDATA:message}\Z" }
tag_on_failure => ["grok2"]
}
grok {
patterns_dir => "/etc/logstash/patterns"
overwrite => ["message"]
match => { "message" => "\A%{STATVALUE}%{GREEDYDATA:message}\Z" }
tag_on_failure => ["grok3"]
}
# standardize on GMT timestamp
date {
target => "@timestamp"
match => ["logtime", "YYYYMMdd HHmmssSSSZ"]
remove_field => ["logtime"]
}
# Drop redundant/useless fields
mutate {
remove_field => ["type", "beat", "count", "fields", "input_type", "offset", "source"]
remove_tag => ["shipper-filebeat", "owmstat"]
}
}
}

which when receiving a message like this:

20151206 142648504+0100 fep1 imapserv 29448;StatFepResolveMS(77/523) 0/sec 0 ms [0] 0/0/0 0/0/0

will give me below result with parse failure, but howto pin point why?

{
"message" => "0/sec 0 ms [0] 0/0/0 0/0/0",
"@version" => "1",
"@timestamp" => "2015-12-06T13:11:48.500Z",
"tags" => [
[0] "output-stdout",
[1] "_grokparsefailure",
[2] "grok3"
],
"host" => "fep1",
"stat" => "StatFepResolveMS",
"rate" => "0",
"stim" => "0",
"pn" => "0",
"t1n1" => "0",
"t1n2" => "0",
"t1n3" => "0",
"t2n1" => "0",
"t2n2" => "0",
"t2n3" => "0",
"@metadata" => {
"beat" => "filebeat",
"type" => "log"
}
}

hints much appreciated, TIA

STATBASE %{LOGTIME:logtime} %{HOSTNAME:host} %{WORD} %{POSINT};%{WORD:stat}(%{POSINT}\/%{POSINT})

Make sure you escape the parentheses (but you can skip escaping forward slashes). That's one reason why it isn't working that I could spot right away.

For the more general question of how to debug grok expressions, I suggest starting with the simplest possible expression and increasing complexity by stepwise appending more of the original expression until you once again get _grokparsefailure). In this case, starting off with

%{STATBASE}

won't help since the problem seems to be buried inside the definition of STATBASE (which is one reason why I think grok patterns defined outside the Logstash configuration file is a PITA). Expand STATBASE into its definition,

%{LOGTIME:logtime} %{HOSTNAME:host} %{WORD} %{POSINT};%{WORD:stat}(%{POSINT}\/%{POSINT})

and reduce it:

%{LOGTIME:logtime}

Does that work? Good—append %{HOSTNAME:host}, and so on.

Right, also how I built it in the first place, bottom up and btw STATBASE should be work as it crops message in the first grok1, and the parse failure seems to be raised in grok3 or...

I'll once (actually several times) again to strip down backwards...

You're right external patterns might be a PITA, but 'normalizing' patterns makes it easier to maintain in case f.ex. your change your input formats, like all the predefined ones :slight_smile:

1 Like

Using this pattern:

STATRATE %{INT:rate}/sec %{INT:stim} ms [%{INT:pn}] %{INT:t1n1}/%{INT:t1n2}/%{INT:t1n3} %{INT:t2n1}/%{INT:t2n2}/

leaves as expected my last integer in message:

{
"message" => "0",
"@version" => "1",
"@timestamp" => "2015-12-06T21:23:48.624Z",
"tags" => [
[0] "output-stdout",
[1] "_grokparsefailure"
],
"host" => "fep1",
"stat" => "StatFepResolveMS",
"rate" => "0",
"stim" => "0",
"pn" => "0",
"t1n1" => "0",
"t1n2" => "0",
"t1n3" => "0",
"t2n1" => "0",
"t2n2" => "0",
"@metadata" => {
"beat" => "filebeat",
"type" => "log",
"logts" => "20151206 222348624+0100",
"esindex" => "filebeat"
}
}

Weird as grokconstructor indicates I should be able to pick up the last number as well, why the F... not when running in logstash then (pardon my french ::wink:)