Varnish grok pattern optimization


I am looking for help in optimizing my grok pattern for my varnishncsa logs.

This is what I have now:

%{IP:ip1}, %{IP:ip2} - \[%{HTTPDATE:timestamp}\] %{WORD:method} '%{URIHOST:host}' '%{PATH:path}' '(?:%{URIPARAM:param}|)' %{NUMBER:http_status} (?:%{NUMBER:bytes}|-') '(?:%{URI:referrer}|-)' %{QS:agent} %{BASE10NUM:berespms} %{WORD:cache_handling} (?:%{QS:jafapp}|-)"

Here is an example of the log output:, - [07/Jun/2016:17:11:30 +0200] GET '' '/the/path/' '' 200 155725 '' 'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko' 0.000049 hit '-'

The current pattern makes logstash use all available CPU and eventually hang.

I have about of 70-80 log lines per second, shipped from filebeat (via a redis broker)

Any help would be much appreciated.



Logstash hanging because of expensive regexps sounds weird. One thing that might help with the CPU load is replacing '%{SOMEPATTERN:...}' with '(?<...>[^']+)'. That could save the regexp engine from having to backtrack. It can simply continue until it finds the next single quote.

Hi Magnus,

Thanks for the reply.
Actually I wrote a manual parser en ruby, in order to identify the culprit.

It seems that the '(?:%{URI:referrer}|-)' did not like referrers with the following format "android-app://" - and would block, use all available for the CPU.

I changed it to URI to NOTSPACE and the problem went away.

Pretty fragile stuff.