Varnish grok pattern optimization

Hi,

I am looking for help in optimizing my grok pattern for my varnishncsa logs.

This is what I have now:

%{IP:ip1}, %{IP:ip2} - \[%{HTTPDATE:timestamp}\] %{WORD:method} '%{URIHOST:host}' '%{PATH:path}' '(?:%{URIPARAM:param}|)' %{NUMBER:http_status} (?:%{NUMBER:bytes}|-') '(?:%{URI:referrer}|-)' %{QS:agent} %{BASE10NUM:berespms} %{WORD:cache_handling} (?:%{QS:jafapp}|-)"

Here is an example of the log output:

91.41.222.50, 172.30.1.1 - [07/Jun/2016:17:11:30 +0200] GET 'www.myhost.com' '/the/path/' '' 200 155725 'http://www.my_referrer.com/another/path/' 'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko' 0.000049 hit '-'

The current pattern makes logstash use all available CPU and eventually hang.

I have about of 70-80 log lines per second, shipped from filebeat (via a redis broker)

Any help would be much appreciated.

thanks

Mike

Logstash hanging because of expensive regexps sounds weird. One thing that might help with the CPU load is replacing '%{SOMEPATTERN:...}' with '(?<...>[^']+)'. That could save the regexp engine from having to backtrack. It can simply continue until it finds the next single quote.

Hi Magnus,

Thanks for the reply.
Actually I wrote a manual parser en ruby, in order to identify the culprit.

It seems that the '(?:%{URI:referrer}|-)' did not like referrers with the following format "android-app://some-android-store-url.com" - and would block, use all available for the CPU.

I changed it to URI to NOTSPACE and the problem went away.

Pretty fragile stuff.

Mike