I have email (text files) in s3 that is being forwarded via SES. These are simple text files with the contents of a single email per file. My logstash below:
input {
s3 {
access_key_id => "xxxxxxx"
secret_access_key => "dxxxxxx"
bucket => "fs-mailit"
type => "s3-access"
region => "us-west-2"
sincedb_path => "/tmp/mailit-logs"
interval => 120
delete => true
}
}
output {
stdout { codec => rubydebug }
}
It works, but reads one line at a time like:
{
"@timestamp" => 2017-06-26T04:31:22.584Z,
"@version" => "1",
"message" => "Return-Path: jackson@gmail.com\r\n",
"type" => "s3-access"
}
{
"@timestamp" => 2017-06-26T04:31:22.590Z,
"@version" => "1",
"message" => "Received: from mail-qt0-f169.google.com (mail-qt0-f169.google.com [209.85.216.169])\r\n",
"type" => "s3-access"
}
And there-in lies the issue. I would like to parse out the the usual items: from, to, subject, data and create a single document in elastic.
Looking for your advice on the best way to do this --
Any way to make s3 slurp the entire file into json rather than one line at a time?
Maybe add a unique identifier that is persistent across the entire file?
A nice email reader plugin would be really helpful so I don't have to manually grok each line.
Many thanks!
-Steve