Syslog grok patterns and pipeline performance

I've been working with RHEL syslogs (/var/log/secure and /var/log/message) that are being shipped to Logstash via Filebeat. I created grok patterns for each of the relevant log lines and create an additional field with the value of "true" when the line matches something of interest that I would like to query on (e.g. failed ssh login, password change, etc). So the groks look like the example below..just many more of them. I was reading through the "Do you grok Grok?" https://www.elastic.co/blog/do-you-grok-grok article and it appears that anchors and other modifications could improve performance. However, I was interested to see if there was a more efficient way to run through all of grok patterns and create the fields/tags that assist with our queries (may create a custom grok patterns file for these syslogs and/or include some if / then conditionals)? Thanks!
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: %{WORD:pam_type}(%{DATA:pam_message}): auth could not identify password for [%{USER:pam_username}]" }
add_field => [ "linux_password_failure", "true" ]
remove_tag => [ "_grokparsefailure" ]
}
grok {
match => { "message" => "Invalid user %{USER:username} from %{IP:src_ip}" }
add_field => [ "ssh_invalid_user", "true" ]
remove_tag => [ "_grokparsefailure" ]
}
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: Failed password for %{USER:username} from %{IP:src_ip} port %{INT:src_port} ssh2" }
add_field => [ "ssh_failed_password", "true" ]
remove_tag => [ "_grokparsefailure" ]
}
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: Accepted %{WORD:auth_method} for %{USER:username} from %{IP:src_ip} port %{INT:src_port} ssh2" }
add_field => [ "ssh_auth_success", "true" ]
remove_tag => [ "_grokparsefailure" ]
}

I find myself doing more RegEx's vs Grok patterns these days.

I have logs coming in from Windows devices, Unix devices, printers, network hardware, appliances etc.

I have a base Grok pattern that I use to create the initial fields for host, etc and then a catch field that I then use Grok patterns or RegEx's against. That way, even if none of my addition parsing works, I have the base log info split up.

Thank Jason! I was thinking about creating a base pattern since most of them start with %{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: If I didn't need the addition field / tag I could put all of the grok patterns together and do a break_on_match, but I am not sure if that is possible in this case. Maybe the use of conditionals could help as well? Anyone have any ideas?

Yeap, I have a very base pattern I use, basically to peel off the header and service name/process name and PID (if it has one). The rest lands in a catch all that I build regex tests against.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.