Multiple Grok filters with Logstash 6 for a single logfile?

I want to parse Debian's auth.log using multiple Grok match patterns and assign individual tags to them.

Here is an exemplary auth.log:

Dec  4 13:39:23 deb sshd[970]: Server listening on 0.0.0.0 port 22.
Dec  4 13:39:23 deb sshd[970]: Server listening on :: port 22.
Dec  4 13:39:30 deb sshd[972]: Invalid user fakeuser from 10.0.2.2 port 61624
Dec  4 13:39:30 deb sshd[972]: input_userauth_request: invalid user fakeuser [preauth]
Dec  4 13:39:33 deb sshd[972]: error: Received disconnect from 10.0.2.2 port 61624:13: Unable to authenticate [preauth]
Dec  4 13:39:33 deb sshd[972]: Disconnected from 10.0.2.2 port 61624 [preauth]
Dec  4 13:39:40 deb sshd[974]: Accepted password for burnersk from 10.0.2.2 port 61633 ssh2
Dec  4 13:39:40 deb sshd[974]: pam_unix(sshd:session): session opened for user burnersk by (uid=0)
Dec  4 13:39:40 deb systemd-logind[484]: New session 4 of user burnersk.
Dec  4 13:39:43 deb sudo:  burnersk : TTY=pts/1 ; PWD=/home/burnersk ; USER=root ; COMMAND=/bin/su -
Dec  4 13:39:43 deb sudo: pam_unix(sudo:session): session opened for user root by burnersk(uid=0)
Dec  4 13:39:43 deb su[993]: Successful su for root by root
Dec  4 13:39:43 deb su[993]: + /dev/pts/1 root:root
Dec  4 13:39:43 deb su[993]: pam_unix(su:session): session opened for user root by burnersk(uid=0)
Dec  4 13:39:43 deb su[993]: pam_systemd(su:session): Cannot create session: Already running in a session
Dec  4 13:39:44 deb su[993]: pam_unix(su:session): session closed for user root
Dec  4 13:39:44 deb sudo: pam_unix(sudo:session): session closed for user root
Dec  4 13:39:45 deb sshd[974]: pam_unix(sshd:session): session closed for user burnersk
Dec  4 13:39:45 deb systemd-logind[484]: Removed session 4.
Dec  4 13:55:13 deb sshd[970]: Received signal 15; terminating.
Dec  4 13:55:27 deb sudo:     root : TTY=pts/0 ; PWD=/var/log ; USER=root ; COMMAND=/bin/echo hi
Dec  4 13:55:27 deb sudo: pam_unix(sudo:session): session opened for user root by burnersk(uid=0)
Dec  4 13:55:27 deb sudo: pam_unix(sudo:session): session closed for user root

Here are the working Grok match patterns, that can parse (excl. "+ /dev/pts/1 root:root") the above auth.log example:

%{SYSLOGBASE}%{SPACE}Server listening on %{IPORHOST:address} port %{POSINT:port}.
%{SYSLOGBASE}%{SPACE}Invalid user %{GREEDYDATA:user} from %{IPORHOST:address} port %{POSINT:port}
%{SYSLOGBASE}%{SPACE}(?<method>[^:]+):%{SPACE}invalid user %{GREEDYDATA:user} \[(?<stage>[^\]]+)\]
%{SYSLOGBASE}%{SPACE}error: Received disconnect from %{IPORHOST:address} port %{POSINT:port}:%{INT}: Unable to authenticate \[(?<stage>[^\]]+)\]
%{SYSLOGBASE}%{SPACE}Disconnected from %{IPORHOST:address} port %{POSINT:port} \[(?<stage>[^\]]+)\]
%{SYSLOGBASE}%{SPACE}Accepted password for %{USERNAME:user} from %{IPORHOST:address} port %{POSINT:port} ssh2
%{SYSLOGBASE}%{SPACE}(?<pam_module>[^\(]+)\((?<pam_module_type>[^\)]+)\): session opened for user %{USERNAME:user} by %{USERNAME:source_user}?\(uid=%{NONNEGINT:source_user_id}\)
%{SYSLOGBASE}%{SPACE}New session %{NONNEGINT:session_id} of user %{USERNAME:user}.
%{SYSLOGBASE}%{SPACE}%{USERNAME:user}%{SPACE}:%{SPACE}TTY=%{DATA:tty}%{SPACE};%{SPACE}PWD=%{DATA:pwd}%{SPACE};%{SPACE}USER=%{USERNAME:target_user}%{SPACE};%{SPACE}COMMAND=%{GREEDYDATA:command}
%{SYSLOGBASE}%{SPACE}Successful su for %{USERNAME:target_user} by %{USERNAME:user}
%{SYSLOGBASE}%{SPACE}(?<pam_module>[^\(]+)\((?<pam_module_type>[^\)]+)\): Cannot create session: %{GREEDYDATA:error}
%{SYSLOGBASE}%{SPACE}(?<pam_module>[^\(]+)\((?<pam_module_type>[^\)]+)\): session closed for user %{USERNAME:user}
%{SYSLOGBASE}%{SPACE}Removed session %{NONNEGINT:session_id}.
%{SYSLOGBASE}%{SPACE}Received signal %{NONNEGINT:posix_signal}; %{GREEDYDATA:action}.

This is my (partial) Logstash 6 configuration approach which fails:

input {
  file {
    path => [ "/var/log/auth.log" ]
    type => "linux_auth_log"
  }
}
filter {
  if [type] == "linux_auth_log" {
    grok {
      match => { "message" => "%{SYSLOGBASE}%{SPACE}%{USERNAME:user}%{SPACE}:%{SPACE}TTY=%{DATA:tty}%{SPACE};%{SPACE}PWD=%{DATA:pwd}%{SPACE};%{SPACE}USER=%{USERNAME:target_user}%{SPACE};%{SPACE}COMMAND=%{GREEDYDATA:command}" }
      if [program] {
        add_tag => [ "%{program}", "%{program}_exec" ]
      }
      add_tag => [ "exec" ]
    }
    grok {
      match => { "message" => "%{SYSLOGBASE}%{SPACE}(?<pam_module>[^\(]+)\((?<pam_module_type>[^\)]+)\): session opened for user %{USERNAME:user} by %{USERNAME:source_user}?\(uid=%{NONNEGINT:source_user_id}\)" }
      if [program] {
        add_tag => [ "%{program}", "%{program}_session", "%{program}_session_opened" ]
      }
      add_tag => [ "session", "session_opened" ]
    }
    grok {
      match => { "message" => "%{SYSLOGBASE}%{SPACE}%{GREEDYDATA:log_message}" }
      if [program] {
        add_tag => [ "%{program}" ]
      }
    }
  }
}
output {
  if [type] == "linux_auth_log" {
    elasticsearch {}
  }
}

My intention with this configuration was that Grok is going through the patterns from top to bottom an only selects the first matching pattern. It should ignore the rest. The last Grok pattern is to catch all the non-matching log messages (catch-all rule).

Having only one Grok section with filters active (comment-out the others) it works. But when two or more Grok sections with filters are active, all messages will receive a _grokparsefailure tag.

My question is: How to achieve such a high flexible configuration with multiple pattern matches and individual tags.

PS: Cross-post-notice - I raised a Stackoverflow question about that but didn't got any answers.

My intention with this configuration was that Grok is going through the patterns from top to bottom an only selects the first matching pattern. It should ignore the rest. The last Grok pattern is to catch all the non-matching log messages (catch-all rule).

That's the behavior if you have a single grok filter that includes multiple expressions. With multiple filters like this they'll all be processed in the order listed. You could do something like this:

grok {
  match => ...
  add_tag => ["exec"]
}
if [program] {
  mutate {
    add_tag => ["%{program}", "%{program}_exec"]
  }
}
if "_grokparsefailure" not in [tags] {
  grok {
    match => ...
    add_tag => ["session", "session_opened"]
  }
  if [program] {
    mutate {
      add_tag => ["%{program}", "%{program}_exec", "%{program}_session", "%{program}_session_opened"]
    }
  }
}
...

@magnusbaeck: If I understand you correctly, I could achieve my goal by doing partial Grok matches, correct?

grok {
  match => { "message" => "%{SYSLOGBASE}%{SPACE}%{GREEDYDATA:log_message}" }
}
if "_grokparsefailure" not in [tags] {
  #
  # Add program label tag
  if [program] {
    mutate { add_tag => [ "%{program}" ] }
  }
  #
  # SSH server
  if [program] == "sshd" {
    #
    # SSH server starts listening for incoming connections
    if [log_message] =~ "Server·listening·on" {
      grok {
        match => { log_message => "Server listening on %{IPORHOST:address} port %{POSINT:port}." }
        add_tag => [ "%{program}_server", "%{program}_server_start" ]
      }
    }
    #
    # SSH server receives a POSIX signal
    if [log_message] =~ "Received signal" {
      grok {
        match => { log_message => "Received signal %{NONNEGINT:posix_signal}; %{GREEDYDATA:action}." }
        add_tag => [ "%{program}_server" ]
      }
      if [action] == "terminating" {
        mutate { add_tag => [ "%{program}_server_stop" ] }
      }
    }
  }
}

That's not what I said, but your suggestion would also work.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.