Trouble getting my grok on

Hello,

I am trying to get a reasonably simple filter for hmail working with logstash but not having much luck so far.

An example log line would be:

2016-01-14 16:03:05 sql@contoso.com dba@contoso.com 192.168.31.221 eu-smtp-inbound-1.contoso.com SMTP ? 250 10378

And my config so far for this looks like:

input {
tcp {
type => "hmail"
port => "2525"
codec => "json_lines"
}
}
filter {
if [type] == "hmail" {
dns {
reverse => [ "host" ]
action => "replace"
}
mutate {
add_tag => [ "hmail" ]
}
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:syslog_timestamp} %{WORD:Sender_Address} %{WORD:Recipient_Address} %{IPORHOST:Sender_Host} %{IPORHOST:Relayed_Host} %{WORD:Protocol} %{WORD:Funny_field} %{NUMBER:SMTP_Reply_Code} %{NUMBER:Session_Number}"}
}
}
}

I tried using %{EMAILADDRESS:Sender_Address} but the emailaddress pattern doesn't exist on my install, is there something better I could be using? I am running logstash 1.4.2

Ive added the emailaddress and emaillocalpart to the patterns and adjusted the filter:

match => { "message" => "%{TIMESTAMP_ISO8601:syslog_timestamp}
%{WORD:Sender_Address} %{EMAILADDRESS:Recipient_Address} %{EMAILADDRESS:Sender_Host}
%{IPORHOST:Relayed_Host} %{WORD:Protocol} %{WORD:Funny_field}
%{NUMBER:SMTP_Reply_Code} %{NUMBER:Session_Number}"}

Unfortunately I am still getting a grokparse failure

Off the top of my head:

  • Sender_Host should probablyuse IPORHOST.
  • Funny_field can't use WORD because "?" isn't a word character. I suggest NOTSPACE instead.
1 Like

Ah, thanks for that. I have switched "funny_field" over to NOTSPACE but im still getting a grokparse failure.

Looking at the raw or JSON message the fields seem to be seperated with /t

e.g.

"_index": "logstash-2016.01.18",
"_type": "hmail",
"_id": "AVJUNvZsKwIKfbmqiO8X",
"_score": null,
"_source": {
"message": "2016-01-18 10:10:26\tusr@contoso.com\tother.test@contoso.com\t192.168.39.128\teu-smtp-inbound-2.contoso.com\tSMTP\t?\t250\t2054\r",
"@version": "1",
"@timestamp": "2016-01-18T10:10:34.088Z",
"host": "10.9.3.40:50967",
"type": "hmail",
"tags": [
"hmail",
"_grokparsefailure"
],
"@source_host_ip": "%{@source_host}"
}

Are these just how the message is being displayed or are these something I need to filter?

Oh, it's a tab-separated file. Instead of spaces in your expression use \s (or %{SPACE}) to match any whitespace characters. You should also be able to use a csv filter to split up the lines.

1 Like

Thanks, that sorted it and the filter is now working.

Here is what I have ended up with:

filter {
if [type] ==
"hmail" {
mutate {
add_tag => [ "hmail" ]
}
grok {
match =>
{ "message" =>
"%{TIMESTAMP_ISO8601:timestamp}\s%{EMAILADDRESS:Sender_Address}\s%{EMAILADDRESS:Recipient_Address}\s%{IPORHOST:Sender_Host}\s%{IPORHOST:Relayed_Host}\s%{WORD:Protocol}\s%{NOTSPACE:Funny_field}\s%{NUMBER:SMTP_Reply_Code}\s%{NUMBER:Session_Number}"}
}
date {
match => [ "timestamp", "ISO8601" ]
remove_field => [ "timestamp" ]
}
}
}