Filebeat, SMTP headers, and multiline flush_pattern (filebeat 6 beta 1)

bjorn · July 18, 2017, 6:42am

I'd like to extract the SMTP headers from my honeypot's logs, using multiline. I was hoping to be able to carve out the headers only, using flush_pattern. Below is an example email:

From user@example.com  Tue Jul 18 00:48:24 2017
Return-Path: <user@example.com>
Envelope-To: user@example.net
Received: from victim ([10.10.10.10])
	by cheater (INetSim) with ESMTPA id 1DC4FA
	for <user@example.net>; Mon, 17 Jul 2017 22:48:24 -0000
X-INetSim-Id: <1DC4FA-6751606e8bdd31d6231fb52867fff7e9a8f4090a@darkstar.example.org>
X-Mailer: testtt
From: user@example.com
Content-Type: text/plain
To: user@example.com
Subject: Some subject

This is some body text

As per RFC, there's an empty line feed after the last header. I was hoping to match the line feed with a suitable regexp, but no such luck. So far I've tried '^$', '\n\n', '^\n' and a few more.

This is my Filebeat config for this prospector:

filebeat:
  prospectors:
    -
      input_type: stdin
      document_type: inetsim-smtp-test
      tags: ["inetsim-smtp-test"]
      multiline.pattern: '^From '
      multiline.negate: true
      multiline.match: after
      multiline.flush_pattern: '\n\n'

When feeding Filebeat log data, the multiline config picks up the whole email; it doesn't stop after the headers. In the resulting event object stored in Elasticsearch, I see \n\n between the last header and the email body when viewing the event as JSON. A binary dump of the log file itself also shows \n\n at that location.

I am quite convinced that this should have worked, but as it's not I seek advise here. If nothing else, then as a bug report for the 6.x development.

# /usr/local/bin/filebeat --version
filebeat version 6.0.0-beta1-git9dc6d52 (arm), libbeat 6.0.0-beta1-git9dc6d52

steffens · July 18, 2017, 11:04pm

the mix of pattern, negate and flush_pattern might be the problem.

e.g. see sample without flush_pattern: https://play.golang.org/p/xcdssBnP8A
I did set negate: true, pattern: '^$'.

You have one email per file?

You plan to index the body, or do you just want to extract the header? Filebeat has no state-machine/parser + will continue processing until end of file. If al you want is the header, use include_lines (also a regex), to match for typical content in your headers.

bjorn · July 19, 2017, 7:22am

Thanks for responding! The suggested settings were taken from the (alpha) documentation for filebeat (https://www.elastic.co/guide/en/beats/filebeat/master/filebeat-configuration-details.html).

I have one file with multiple mails in it, the file conforms to the regular "mbox" format (https://en.wikipedia.org/wiki/Mbox). Each email's header will start with "From ", the headers will end with a blank line, and then there'll be random mail body content until the next "From " (note the space).

In other words, I wish to extract the contents from and including "From " to the first empty line, every time it occurs in the file.

For thr purpose I have no interest in the email body, so I would like to discard it before shipping the headers. I could easily drop it in logstash, but I'd like to trim the content before shipping.

I made some attempts at using include_lines for filtering, my testing assumed that include_lines would be applied before multilines logic which I now see is incorrect.

Will make more attempts and update this thread. Thanks again!

bjorn · July 19, 2017, 9:59am

OK, got it working based on @steffens' suggestions, at least for my simple test mail. Hopefully the config will withstand real-world use as well. The following configuration now works as intended:
include_lines: ['^From [\w.+=:-]+@[0-9A-Za-z][0-9A-Za-z-]{0,62}(?:.)* ']
multiline.pattern: '^$'
multiline.negate: true
multiline.match: after

The extensive include_lines regexp matches "From " and an email address, in case any email body should start with "From" as well.

On the Logstash recipient side, after some sanitation I'm using the kv filter to separate out the different SMTP headers and it seems to work well.

Thanks again!

steffens · July 19, 2017, 2:04pm

Great you got it working. In development branch we started to enable support for custom prospector types (written in go). Feel free to open a feature request for the mbox file format here: https://github.com/elastic/beats/issues

system · August 16, 2017, 2:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.