Parsing mail from (AWS) S3

I have email (text files) in s3 that is being forwarded via SES. These are simple text files with the contents of a single email per file. My logstash below:

input {
s3 {
access_key_id => "xxxxxxx"
secret_access_key => "dxxxxxx"
bucket => "fs-mailit"
type => "s3-access"
region => "us-west-2"
sincedb_path => "/tmp/mailit-logs"
interval => 120
delete => true
}
}

output {
stdout { codec => rubydebug }
}

It works, but reads one line at a time like:

{
"@timestamp" => 2017-06-26T04:31:22.584Z,
"@version" => "1",
"message" => "Return-Path: jackson@gmail.com\r\n",
"type" => "s3-access"
}
{
"@timestamp" => 2017-06-26T04:31:22.590Z,
"@version" => "1",
"message" => "Received: from mail-qt0-f169.google.com (mail-qt0-f169.google.com [209.85.216.169])\r\n",
"type" => "s3-access"
}

And there-in lies the issue. I would like to parse out the the usual items: from, to, subject, data and create a single document in elastic.

Looking for your advice on the best way to do this --
Any way to make s3 slurp the entire file into json rather than one line at a time?

Maybe add a unique identifier that is persistent across the entire file?

A nice email reader plugin would be really helpful so I don't have to manually grok each line.

Many thanks!
-Steve

The S3 input is line oriented (from the code read_file(filename) do |line|), its designed to read log lines from files stashed in S3.

The multiline codec is designed to accumulate + assemble those lines and, based on some rules, will join lines into a larger string of text to put into each event. This may be of use to you.

Your problem will be in configuring the rules. You need to know what characters will mark the beginning or end of each email in a "stream" of lines (for this imagine that all your email files in S3 were concatenated into one v big file). Also the start or end characters can be specified as a regular expression and will be consumed i.e. they will not appear in the assembled text.

If you post three or four redacted emails here then maybe I can help. Hint: choose v short emails and use triple backticks on a new line to top and tail each email when you post them.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.