Hi there
I'm using logstash (docker.elastic.co/logstash/logstash:7.4.0) to ingest data to our open data portal and saving said data to files (.csv, .geojson, etc.).
All was well, until we got to one of our data sources which contains a pdf file, encoded in base64, in one of the fields. We get this data through an API and the pdf data is automagicaly ingested into a field (json
codec in our input{}
) as expected (thank you logstash!).
I need to save the content of this field to an actual pdf file. In order to do so I've tried to use the file{}
output plugin, after decoding the data, like this:
filter {
ruby {
code => "
event.set('pdf_data', Base64.decode64(event.get('json_field_with_b64_data')))
"
}
}
output {
file {
path => "data/%{my_file_name}.pdf"
codec => plain {format => "%{pdfdata}"}
}
}
What I get is a .pdf
file with, apparently, the right contents (if I open it with a text editor), but which turns to a blank page when I open it with Acrobat Reader. I have also tried with line
codec instead of plain
, with the same result.
I've been searching the documentation looking for other output plugins, and other codecs, but I haven't been able to find an alternative. I have not found any similar case here either.
Maybe a ruby filter could do the writing instead? Has anyone done something similar? Is it even possible to write a pdf file from logstash? Any ideas or pointers will be appreciated.
Thanks in advance