Syslog grok patterns and pipeline performance

imperatives · February 8, 2017, 3:13pm

I've been working with RHEL syslogs (/var/log/secure and /var/log/message) that are being shipped to Logstash via Filebeat. I created grok patterns for each of the relevant log lines and create an additional field with the value of "true" when the line matches something of interest that I would like to query on (e.g. failed ssh login, password change, etc). So the groks look like the example below..just many more of them. I was reading through the "Do you grok Grok?" https://www.elastic.co/blog/do-you-grok-grok article and it appears that anchors and other modifications could improve performance. However, I was interested to see if there was a more efficient way to run through all of grok patterns and create the fields/tags that assist with our queries (may create a custom grok patterns file for these syslogs and/or include some if / then conditionals)? Thanks!
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: %{WORD:pam_type}(%{DATA:pam_message}): auth could not identify password for [%{USER:pam_username}]" }
add_field => [ "linux_password_failure", "true" ]
remove_tag => [ "_grokparsefailure" ]
}
grok {
match => { "message" => "Invalid user %{USER:username} from %{IP:src_ip}" }
add_field => [ "ssh_invalid_user", "true" ]
remove_tag => [ "_grokparsefailure" ]
}
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: Failed password for %{USER:username} from %{IP:src_ip} port %{INT:src_port} ssh2" }
add_field => [ "ssh_failed_password", "true" ]
remove_tag => [ "_grokparsefailure" ]
}
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: Accepted %{WORD:auth_method} for %{USER:username} from %{IP:src_ip} port %{INT:src_port} ssh2" }
add_field => [ "ssh_auth_success", "true" ]
remove_tag => [ "_grokparsefailure" ]
}

kopacko · February 8, 2017, 3:32pm

I find myself doing more RegEx's vs Grok patterns these days.

I have logs coming in from Windows devices, Unix devices, printers, network hardware, appliances etc.

I have a base Grok pattern that I use to create the initial fields for host, etc and then a catch field that I then use Grok patterns or RegEx's against. That way, even if none of my addition parsing works, I have the base log info split up.

imperatives · February 8, 2017, 4:18pm

Thank Jason! I was thinking about creating a base pattern since most of them start with %{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: If I didn't need the addition field / tag I could put all of the grok patterns together and do a break_on_match, but I am not sure if that is possible in this case. Maybe the use of conditionals could help as well? Anyone have any ideas?

kopacko · February 8, 2017, 4:35pm

Yeap, I have a very base pattern I use, basically to peel off the header and service name/process name and PID (if it has one). The rest lands in a catch all that I build regex tests against.

system · March 8, 2017, 4:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Grok pattern Syslog auth Logstash	3	1373	June 14, 2022
Regex performance with logstash using more general match Logstash	10	1239	December 1, 2020
Grok is succesfull in various debuggers but fails in practice with specific keyword Logstash	3	630	August 13, 2018
Parsing syslog messages with Different Network equipment providers Logstash	5	3136	July 6, 2017
Trying to grok a watchguard Logstash	3	1870	August 23, 2018

Syslog grok patterns and pipeline performance

Related topics