I'm trying to use Logstash to ingest emails and write the messages into Elasticsearch and write attachments onto the filesystem (maybe S3 later). My issue is that the files are not getting written correctly so not readable.
I'm stuck on writing the attachments to the filesystem. Here is a snippet of my pipelines.yaml:
- pipeline.id: email-write-attachments
config.string: |
input {
pipeline { address => "process_attachments" }
}
filter {
# Drop events that don't have attachments
if ![attachments] {
drop {}
}
# Prune out all fields except attachments and message-id
prune {
whitelist_names => ["^attachments$", "message-id"]
}
# Split each event by attachments
split {
field => "attachments"
}
# Decode attachments - Note not checking the "content-transfer-type", just assuming
# it's always base64
# https://discuss.elastic.co/t/filter-decode-from-base64/89282
ruby {
init => "require 'base64'"
code => "event.set('[attachments][body]', Base64.decode64(event.get('[attachments][body]')))"
}
}
output {
file {
path => "/work/attachments.out"
}
file {
path => "/work/attachments/%{[message-id]}/%{[attachments][filename]}"
codec => plain { format => "%{[attachments][body]}" }
}
}
The gist of what's going on above is taking the email messages, grabbing the attachments and splitting them up into individual events and then writing each event (attachment) into their respective file. For example an email could have attachments of a PDF and an image. Those two attachments get split into two events - one for the PDF and one for the image. In the end I want a PDF file and a image file given the filename from the email.
All the attachments come in Base64 encoded and so I have a filter step to decode. My issue I think lies with either the output file plugin or the codec plain plugin. The files being written seem to be binary but corrupt.
As a separate test, I wrote the base encoded attachments to file and then in a separate script simply read the file, base 64 decoded and then wrote to another file. That works fine so that tells me that the issue is with the output part of the pipeline.
Other things I tried:
- Specfiying the
charset
to beBINARY
had no impact - Set the file output to flush
0
had no impact - In my separate script, I tried opening the file with the
a+
mode like that in the output file plugin and that still seemed to work
Any ideas where the problem is and how to fix it?
Thank you