How to logstash patterns/regex/co


I'm just playing around with logstash and hope to get help with a few topics.

I'm trying to log some HP Procurve Switch syslog messages into elasticsearch. I used the example filter from the website. Which looks the following:
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }

The part %{DATA:syslog_program} logs the program which this message is coming from.

My switch sends the following message for example:
May 15 14:04:23 00179 mgr: SME SSH from - MANAGER Mode

1) I get the following output on commandline:

         "syslog_hostname" => "",
             "received_at" => "2020-05-15T12:04:23.500Z",
              "@timestamp" => 2020-05-15T12:04:23.000Z,
          "syslog_message" => " SME SSH from - MANAGER Mode",
                 "message" => "<46> May 15 14:04:23  00179 mgr:  SME SSH from - MANAGER Mode",
        "syslog_timestamp" => "May 15 14:04:23",
                "facility" => 0,
                "@version" => "1",
                "severity" => 0,
                    "host" => "",
                    "type" => "syslog",
          "syslog_program" => "00179 mgr",
           "received_from" => " ",
                "priority" => 0,
                    "tags" => [
            [0] "_grokparsefailure_sysloginput"
          "facility_label" => "kernel",
          "severity_label" => "Emergency"

Why is there a _grokparsefailure_sysloginput? What is the cause for that? How can I interpret this?

2) As you can see also there is a number leading the syslog program. Don't know if this is RFC compliant coming from the switch. I tried to write a regex to exclude this number, because I don't need this and want to have it cleaner.

I tried it with the following regex which works in online regex testers but not with logstash:

So my pattern file looks like:
HPPROGRAM \D+\:\s{2}

And my filter:

    filter {
      if [type] == "syslog" {
        grok {
          patterns_dir => ["/etc/logstash/patterns"]
          match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{HPPROGRAM :syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
          add_field => [ "received_at", "%{@timestamp}" ]
          add_field => [ "received_from", "%{host}" ]
        date {
          match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]

But it simply isn't working with the following error:
[0] "_grokparsefailure_sysloginput",
[1] "_grokparsefailure"

Isn't it right that my regex should match the program string if I want to extraxt the program string?! I'm really confused.

3) Where are the default patterns stored?
4) How can I use regex match groups?

can you post what final mapping do you expect?

this portion maps syslog_program to 00179 mgr because DATA equals to .*?

you can find grok patterns here

I only want the "mgr" (or any other string which sends my switch) without the number.

match => { "message" => “ %{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{INT} %{WORD:syslog_program}: %{GREEDYDATA:syslog-message} “

drop INT if you don’t want it or replace it with \d+

1 Like

The syslog input applies a grok_pattern, which by default parses off the PRI at the beginning (the number in angle brackets). If that grok fails then it adds that tag.

PRI stands for?

Priority. RFC 3164 defines it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.