I have a quite long regex to pars my log message. But it drops with Timeout executing grok
. Does have any mechanisms to speed up regex work? Also I have a multiple regexs and if one is failed does it other works ?
Have you tried to adjust the timeout setting?
I thought about that. But I have 1M logs and I guess it will be ending in my next life
Perhaps share a few log lines and the groks... Maybe we can help tune,
Or sometimes breaking down a line with dissect first then grok the rest can help.
Or grok a part of it with the rest in GREEDYDATA (or not see below) then use one of the fields from the first grok as an identifier for more specific groks...
I.e Break it down with iterations.
Just a thought.
I'm not a regex expert though.
I suggest you read this blog. Anchor your patterns. Avoid using GREEDYDATA (try DATA, in delimited fields use custom patterns such as (?<someField>[^,]*),
if a field ends with a comma). If you can show an example event and your patterns we can probably be more specific.
Yep. It's my filter
block
filter {
if 'django' in [tags] {
grok {
match => {
"message" => '(?m)%{LOGLEVEL:log-level} %{TIMESTAMP_ISO8601:timestamp}.*User- (?<user>[^;]+).*id - (?<user-id>[^;]+).*email - (?<email>[^;]+).*Agent-(?<useragent>[^;]+).*Request: \"(?<request>[^\s^\"]+).*Method: \"(?<method>[\w]+).*Module: (?<module>[^;]+).*Function: (?<func>[^;]+)' # backend.info.log
}
}
} else {
grok {
### other simple regexs ###
}
}
date {
match => ["timestamp", "yyyy-MM-dd HH:mm:ss,SSS", "yyyy-MM-dd HH:mm:ss", "yyyy-MM-dd HH:mm:ss.SSS", "ISO8601"]
timezone => "Europe/Moscow"
}
}
I have a lot of django logs. They have different types, legacy too. Legacy logs should not pass this regex but they call timeout error. And I don't know how long correct log passes this regex
It's legacy log:
INFO 2019-08-24 11:52:56,619
Message:
User - Бондаренко; id - 111111; email - test@mail.ru;
Agent - Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18363;
ip - 10.205.0.10;
Request: "/responses/" Method: "POST";
In module: /app/lib/utils/logging.py
It's correct log:
INFO 2021-04-04 20:31:04,621
Message:
User - abykov_pexu; Name: Быков; id - 11111; email - test@mail.ru;
Agent - Mozilla/5.0 (Linux; Android 5.0.2; SAMSUNG SM-T531 Build/LRX22G) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/3.3 Chrome/38.0.2125.102 Safari/537.36;
ip - 188.170.1.1;
token: eyJ0....ZGhI
Request: "/auth/logout/" Method: "POST" Параметры: {}
Request body:
Module: oracle.views; Function: logout;
I could make some typing error because I translated it but I check my original regex on https://grokdebug.herokuapp.com/ and it works.
That is typically the case. If the pattern matches then it matches, but if it does not match then grok spends lots of time trying a match and then backtracking to try a different match. All those .* expressions embedded in your pattern are expensive when the pattern does not match.
I would take a different approach, using multiple patterns
grok {
break_on_match => false
match => {
"message" => [
"\A%{LOGLEVEL:log-level} %{TIMESTAMP_ISO8601:timestamp}",
"User- (?<user>[^;]+)",
"id - (?<user-id>[^;]+)",
"email - (?<email>[^;]+)",
"Agent-(?<useragent>[^;]+)",
"Request: \"(?<request>[^\s^\"]+)",
"Method: \"(?<method>[\w]+)",
"Module: (?<module>[^;]+)",
"Function: (?<func>[^;]+)"
]
}
Not sure if you need (?m) on all those patterns, you will have to test that.
I read the article that you sent. It has a lot good tips. But do you think it's a good idea to allow the match
breaking and trying to find all regex in incoming message ?
It confused me a little.
Will they stop after first match or global match ?
If you set break_on_match => false
then grok will try each of the patterns in turn, regardless of whether they match or not.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.