How do you analyze logstash grok parsing for errors?

I want to get systemname and typelog in the path

sample:
/usr/local/xxx-springboot/logs/xxx/operator.log

Correct result
systemname => xxx
typelog => operator

grok2

filter {
grok {
match => {
"source" => "/%{WORD}/%{WORD}/%{GREEDYDATA}/%{WORD}/%{WORD:servicename}/%{WORD:typelog}"
}
}
}

output{
elasticsearch{
hosts => ["10.81.176.31","10.27.69.118"]
index => "%{[servicename]}-%{+YYYY.MM.dd}"
}

But it failed to parse, i can't find the wrong place

Hi,

I assume that your path structure always looks like this:

/usr/local/<SYSTEMNAME>-springboot/logs/<SYSTEMNAME>/<LOGTYPE>.log

Then I would parse it like this:

grok pattern:

^/usr/local/%{NOT_DASH:systemname}-springboot/logs/%{GREEDYDATA:typelog}.log

And you need to add this custom pattern:
NOT_DASH [^\-]+

image

Regards, Andreas

Thanks for your advice. I found the cause of the problem.I need to use as little %{WORD} as possible.
The %{WORD} used earlier may have mapped to the latter

But be careful with greedydata. It is an eqivalent vor * in underlying regex. If you have very long fields like stacktraces and you want to search something with greedydata, it may happen that logstash needs to parse the whole file and may fail at the end because it finds no match.
For all regex / grok try to set anchors like ^ or $ or static text if possible. Make it fail as fast as possible.

If you want to tune regex I can recommend regex101.com. It shows the number of steps needed. So you can find out wrong usage of greedydata (*) easily. But you have to convert grok to regex, which syntax is a bit different on defining the variables / regex groups.

I always try to to use greedydata as few as possible and rather use patterns like NOT_DASH to have anything until the given character.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.