Getting _grokparsefailure as a tag

Hi

I was just wondering why i keep getting the tag _grokparsefailure in my syslog messages?
Capture

My logstash config file looks like this

input {
  tcp {
    port => 5000
    type => syslog
  }
  udp {
    port => 5000
    type => syslog
  }
}

filter {
  if [type] == "syslog" {
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?:
%{GREEDYDATA:syslog_message}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      add_field => [ "received_from", "%{host}" ]
    }
    date {
      match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
  }
}

output {
  elasticsearch {
 index => ["logstash_syslog_index"]
  hosts => ["localhost:9200"]
 }
}

Because the message field you show does not match the regexp you are giving to grok.

I'm fairly new to this so I don't understand your explanation. I took this logstash config from from https://www.elastic.co/guide/en/logstash/current/config-examples.html
Could you try to lower your explanation level or give me a link where i can read up on this?

That pattern, which is essentially what you are using, would match the kind of message I would expect to see on /var/log/syslog on a Solaris server. Those look like this:

May 11 10:40:48 scrooge disk-health-nurse[26783]: [ID 702911 user.error] m:SY-mon-full-500 c:H : partition health measures for /var did not suffice

But your message starts with a PRI, then has some number followed by a timestamp that appears to include a timezone name (unless you have a host called CET), which strikes me as rather not syslog.

Personally I would not even use grok for this. I think dissect is better (cheaper to run, easier to configure). With dissect I would use this (assuming CET is a timezone)

  dissect { mapping => { "message" => "<%{pri}>%{}: %{ts} %{+ts} %{+ts} %{+ts}: %{syslog_message}" } }
  date { match => [ "ts", "MMM d HH:mm:ss ZZZ" ] }

If you really want or need to use grok, start with something like this. If it really is a timezone you will need to glue it onto timestamp using mutate+add_field before parsing it using date.

"<%{NUMBER}>%{NUMBER}: %{SYSLOGTIMESTAMP:syslog_timestamp} %{WORD:timezoneorhost}: %{GREEDYDATA:syslog_message}"

Note that in your date filter, you do not need two formats. You have

match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]

The first one should not have two spaces before the d, since your timestamp field does not. You can use

match => [ "syslog_timestamp", "MMM d HH:mm:ss" ]

since that will match both "Apr 9 08:46:53" and "Apr 19 08:46:53". I hope that helps.

Hi

I've tried your suggestion and after some tweaks i got it to work as i wanted, well almost. I've just have one question how can i get the severity level in dissect? As for now I get the PRI correctly, but I want to "translate" or somehow get the correct log level. Should I use Grok for this? Or is there some way in dissect?
Once again thank you Badger for the reply and the help this far!

If you want to translate the numeric PRI into a text string then the translate filter is what to use.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.