Multiline filter/Codec question

Hello,
I'm working on exim email logs and I'm stuck with this codec.
here is an example from the log:

2017-04-03 02:19:58 H=(localhost) [117.0.54.236] F=test@earatt.net rejected RCPT smityrd@example.com: Rejected message because 117.0.54.236 is in a black list at huzkzg6n5flrulopcolvmnfhty.zen.dq.spamhaus.net
2017-04-03 02:19:58 unexpected disconnection while reading SMTP command from (localhost) [117.0.54.236] (error: Connection reset by peer)
2017-04-03 02:19:58 dovecot_login authenticator failed for (ylmf-pc) [104.247.196.7]: 535 Incorrect authentication data (set_id=@example.com)
2017-04-03 02:19:58 no host name found for IP address 58.187.167.240
2017-04-03 02:19:58 1cuy9W-000F1j-JS DKIM: d=stratr.com s=k1 c=relaxed/relaxed a=rsa-sha1 b=1024 i=noreply@stratfor.com [verification succeeded]
2017-04-03 02:19:58 1cuy9W-000F1j-JS <= bounce-mc.us4_7958185.306033-test=example.com@mail10.4.rsgsv.net H=mail10.atl11.rsgsv.net [205.201.133.10] P=esmtp S=21651 id=7478.rsgsv.net T="How Japan Got Baseball"
2017-04-03 02:19:58 no host name found for IP address 46.29.251.135
2017-04-03 02:19:58 1cuy9W-000F1j-JS => test test@example.com R=mysql_user T=mysql_delivery
2017-04-03 02:19:58 1cuy9W-000F1j-JS Completed
I was trying to use following logstash config:

filter {

if [type] == "eximlog" {

mutate {
add_field => {
"message_1" => "%{message}"
}
}
multiline {
patterns_dir => "/etc/logstash/patterns/"
pattern => "%{DATE} %{TIME} %{HOSTNAME:exim_msg_id} (=>|Completed)"
negate => false
what => "previous"
}

grok {
patterns_dir => "/etc/logstash/patterns/"

break_on_match => false

match => [
"message_1", "%{DATE} %{TIME} %{HOSTNAME:exim_msg_id} %{GREEDYDATA}"
]

match => [ 'message', '%{EXIM_ALL_RULES}']

}

remove fields

mutate {
  remove_field => [ 'host', 'offset' ]
}

Remove the really, really dirty hack to workaround bug in grok code

which won't handle multiple matches on the same field

mutate {
remove_field => [ "message_1"]
}

}
}

Each message might have at least 3 lines that only common value is date,message_id.
The problem I stumble on is that there are other log entries unrelated to that message but they still getting included. For example in message above there is additional entry "2017-04-03 02:19:58 no hostname found for IP address 46.29.251.135" between 2017-04-03 02:19:58 1cuy9W-000F1j-JS related rows.

Please advise.

(The subject of your post talks about the multiline codec but you're using the deprecated multiline filter.)

You can't use multiline for this purpose for the reason you've just discovered. Have you looked at the aggregate filter?

It looks like you are using the multiline filter (which has been deprecated) rather than the multiline codec, which you configure as part of each input).

Where are you getting the logs? Which input plugins are you using? Which version of Logstash are you using?

I'm sorry you rigth I had same issues when I tried codec. I will update subject.......
I didn't look into aggregate filter yet thanks for an advice.

Here is my input section that I have tried before:
input {
beats {
port => "6001"
codec => multiline {
pattern => "%{DATE} %{TIME} %{EXIM_MSGID:exim_msg_id} (=>|Completed)"
negate => false
what => previous
}
}
}

and pattern for EXIM_MSGID
EXIM_MSGID [0-9A-Za-z]{6}-[0-9A-Za-z]{6}-[0-9A-Za-z]{2}

I'm pushing multiple log types from email server to ELK server using file beat. (file beat version 5.2.2)
All software is running on latest version: 5.3.0
I'm using a filter because I couldn't figure out how to apply input filter to just that one log type.

Do not use the multiline codec together with the beats input. Join multiline events at the source, i.e. with Filebeat.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.