Hey folks,
there was a nice article on the elastic homepage a while ago about the performance of grok filters, so i went on and set up this script to mesure throughput for the filter.
In my understand using %{DATA:sth} is mostly a bad habbit and results into slow filters so i went on tried getting rid of those and use the built-in grok patterns, however this did not result into higher throughput in the script,
An example:
the log looks like this:
2017-02-14 14:33:22\ttimezone:+1\servername(program)\t[11902] PASS username@domain IP.Ad.re.ss blagroup filterbla http://www.example.com/j/guet.gif GET
our current/previous pattern was this:
%{DATA:date}\ttimezone:%{DATA:timezone}\t%{DATA:ddd}(%{DATA:program})\t[[0-9]{1,7}]{1,2}%{DATA:action}{1,2}%{DATA:username}{1,30}%{DATA:src}{1,9}%{DATA:usergroup}{1,16}%{DATA:filtergroup}{1,30}%{DATA:url}{1,7}%{DATA:method}$
with this i get about 150k/s throughput with the provided script.
I went on an replaced the %{DATA}-Parts with built-in patterns:
%{TIMESTAMP_ISO8601:date}\ttimezone:%{NOTSPACE:timezone}\t%{NOTSPACE:ddd}(%{NOTSPACE:program})\t[[0-9]{1,7}]{1,2}%{WORD:action}{1,2}%{NOTSPACE:username}%{IPV4:src}{1,9}%{NOTSPACE:usergroup}{1,16}%{WORD:filtergroup}{1,30}%{URI:url}%{WORD:method}
This however resulted in 90k/s throughput.
So my question: Is my assumtion right that %{DATA} is not fast? is there any mistake i made with the patterns?
Thank you for your response