I want to parse Debian's auth.log using multiple Grok match patterns and assign individual tags to them.
Here is an exemplary auth.log:
Dec 4 13:39:23 deb sshd[970]: Server listening on 0.0.0.0 port 22.
Dec 4 13:39:23 deb sshd[970]: Server listening on :: port 22.
Dec 4 13:39:30 deb sshd[972]: Invalid user fakeuser from 10.0.2.2 port 61624
Dec 4 13:39:30 deb sshd[972]: input_userauth_request: invalid user fakeuser [preauth]
Dec 4 13:39:33 deb sshd[972]: error: Received disconnect from 10.0.2.2 port 61624:13: Unable to authenticate [preauth]
Dec 4 13:39:33 deb sshd[972]: Disconnected from 10.0.2.2 port 61624 [preauth]
Dec 4 13:39:40 deb sshd[974]: Accepted password for burnersk from 10.0.2.2 port 61633 ssh2
Dec 4 13:39:40 deb sshd[974]: pam_unix(sshd:session): session opened for user burnersk by (uid=0)
Dec 4 13:39:40 deb systemd-logind[484]: New session 4 of user burnersk.
Dec 4 13:39:43 deb sudo: burnersk : TTY=pts/1 ; PWD=/home/burnersk ; USER=root ; COMMAND=/bin/su -
Dec 4 13:39:43 deb sudo: pam_unix(sudo:session): session opened for user root by burnersk(uid=0)
Dec 4 13:39:43 deb su[993]: Successful su for root by root
Dec 4 13:39:43 deb su[993]: + /dev/pts/1 root:root
Dec 4 13:39:43 deb su[993]: pam_unix(su:session): session opened for user root by burnersk(uid=0)
Dec 4 13:39:43 deb su[993]: pam_systemd(su:session): Cannot create session: Already running in a session
Dec 4 13:39:44 deb su[993]: pam_unix(su:session): session closed for user root
Dec 4 13:39:44 deb sudo: pam_unix(sudo:session): session closed for user root
Dec 4 13:39:45 deb sshd[974]: pam_unix(sshd:session): session closed for user burnersk
Dec 4 13:39:45 deb systemd-logind[484]: Removed session 4.
Dec 4 13:55:13 deb sshd[970]: Received signal 15; terminating.
Dec 4 13:55:27 deb sudo: root : TTY=pts/0 ; PWD=/var/log ; USER=root ; COMMAND=/bin/echo hi
Dec 4 13:55:27 deb sudo: pam_unix(sudo:session): session opened for user root by burnersk(uid=0)
Dec 4 13:55:27 deb sudo: pam_unix(sudo:session): session closed for user root
Here are the working Grok match patterns, that can parse (excl. "+ /dev/pts/1 root:root") the above auth.log example:
%{SYSLOGBASE}%{SPACE}Server listening on %{IPORHOST:address} port %{POSINT:port}.
%{SYSLOGBASE}%{SPACE}Invalid user %{GREEDYDATA:user} from %{IPORHOST:address} port %{POSINT:port}
%{SYSLOGBASE}%{SPACE}(?<method>[^:]+):%{SPACE}invalid user %{GREEDYDATA:user} \[(?<stage>[^\]]+)\]
%{SYSLOGBASE}%{SPACE}error: Received disconnect from %{IPORHOST:address} port %{POSINT:port}:%{INT}: Unable to authenticate \[(?<stage>[^\]]+)\]
%{SYSLOGBASE}%{SPACE}Disconnected from %{IPORHOST:address} port %{POSINT:port} \[(?<stage>[^\]]+)\]
%{SYSLOGBASE}%{SPACE}Accepted password for %{USERNAME:user} from %{IPORHOST:address} port %{POSINT:port} ssh2
%{SYSLOGBASE}%{SPACE}(?<pam_module>[^\(]+)\((?<pam_module_type>[^\)]+)\): session opened for user %{USERNAME:user} by %{USERNAME:source_user}?\(uid=%{NONNEGINT:source_user_id}\)
%{SYSLOGBASE}%{SPACE}New session %{NONNEGINT:session_id} of user %{USERNAME:user}.
%{SYSLOGBASE}%{SPACE}%{USERNAME:user}%{SPACE}:%{SPACE}TTY=%{DATA:tty}%{SPACE};%{SPACE}PWD=%{DATA:pwd}%{SPACE};%{SPACE}USER=%{USERNAME:target_user}%{SPACE};%{SPACE}COMMAND=%{GREEDYDATA:command}
%{SYSLOGBASE}%{SPACE}Successful su for %{USERNAME:target_user} by %{USERNAME:user}
%{SYSLOGBASE}%{SPACE}(?<pam_module>[^\(]+)\((?<pam_module_type>[^\)]+)\): Cannot create session: %{GREEDYDATA:error}
%{SYSLOGBASE}%{SPACE}(?<pam_module>[^\(]+)\((?<pam_module_type>[^\)]+)\): session closed for user %{USERNAME:user}
%{SYSLOGBASE}%{SPACE}Removed session %{NONNEGINT:session_id}.
%{SYSLOGBASE}%{SPACE}Received signal %{NONNEGINT:posix_signal}; %{GREEDYDATA:action}.
This is my (partial) Logstash 6 configuration approach which fails:
input {
file {
path => [ "/var/log/auth.log" ]
type => "linux_auth_log"
}
}
filter {
if [type] == "linux_auth_log" {
grok {
match => { "message" => "%{SYSLOGBASE}%{SPACE}%{USERNAME:user}%{SPACE}:%{SPACE}TTY=%{DATA:tty}%{SPACE};%{SPACE}PWD=%{DATA:pwd}%{SPACE};%{SPACE}USER=%{USERNAME:target_user}%{SPACE};%{SPACE}COMMAND=%{GREEDYDATA:command}" }
if [program] {
add_tag => [ "%{program}", "%{program}_exec" ]
}
add_tag => [ "exec" ]
}
grok {
match => { "message" => "%{SYSLOGBASE}%{SPACE}(?<pam_module>[^\(]+)\((?<pam_module_type>[^\)]+)\): session opened for user %{USERNAME:user} by %{USERNAME:source_user}?\(uid=%{NONNEGINT:source_user_id}\)" }
if [program] {
add_tag => [ "%{program}", "%{program}_session", "%{program}_session_opened" ]
}
add_tag => [ "session", "session_opened" ]
}
grok {
match => { "message" => "%{SYSLOGBASE}%{SPACE}%{GREEDYDATA:log_message}" }
if [program] {
add_tag => [ "%{program}" ]
}
}
}
}
output {
if [type] == "linux_auth_log" {
elasticsearch {}
}
}
My intention with this configuration was that Grok is going through the patterns from top to bottom an only selects the first matching pattern. It should ignore the rest. The last Grok pattern is to catch all the non-matching log messages (catch-all rule).
Having only one Grok section with filters active (comment-out the others) it works. But when two or more Grok sections with filters are active, all messages will receive a _grokparsefailure tag.
My question is: How to achieve such a high flexible configuration with multiple pattern matches and individual tags.
PS: Cross-post-notice - I raised a Stackoverflow question about that but didn't got any answers.