Grok performance problem when parsing dots, dashs, underscore


#1

Hi all,
I'm issuing a performance problem with grok filter.
I use filebeat 5.4.6 to send log file event to logstash 5.4.6.
I made a very simple grok filter in logstash to extract path and filename of the log file from "source" field from filebeat :
grok {
match => { "source" => "%{UNIXPATH:[filepath]}/%{NOTSPACE:[filename]}" }
}

It works very well with a lot of filename but the filter is very slow when there is many dots, dashs, underscore in the filename.
Example : /var/log/nginx/mynginx01access.log -> very fast
/var/log/nginx/my_nginx-01.access.log -> very slow and CPU costly

I try many pattern to replace %{NOTSPACE } whith %{DATA}, %{GREEDYDATA}... whitout any result. The CPU loads for the filter seems to be an exponential of the number of (.,-,) in the filename.
If you replace (.,-,
) whith other special charater (#,$,^,space...), it's fast again.

I don't know how to fix this problem, because I try every possible pattern.

Help would be very appreciated.

Simon


(Guy Boertje) #2

Read this https://www.elastic.co/blog/do-you-grok-grok

then after that try:
^%{UNIXPATH:[filepath]}/%{JAVAFILE:[filename]}$


#3

Thank you for your response.
First, I made a mistake on the versions of filebeat and Logstash, i work on the latest 5.6.2 on CentOS 7u3.
I try to change the match expression with your hint but it didn't solve the problem. I reproduce the same bad execution time.


#4

I made a few more tests and I solved the problem by changing UNIXPATH pattern by DATA or GREEDYDATA.
grok {
match => { "source" => "^%{DATA:[fields][filepath]}/%{JAVAFILE:[fields][filename]}$" }
}

I don't understand how the matching on UNIXPATH pattern is dependant on the format of the last part of string after the "/".

Thank you for your first response.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.