Output: how to save a field to a pdf file?

Hi there

I'm using logstash (docker.elastic.co/logstash/logstash:7.4.0) to ingest data to our open data portal and saving said data to files (.csv, .geojson, etc.).

All was well, until we got to one of our data sources which contains a pdf file, encoded in base64, in one of the fields. We get this data through an API and the pdf data is automagicaly ingested into a field (json codec in our input{}) as expected (thank you logstash!).

I need to save the content of this field to an actual pdf file. In order to do so I've tried to use the file{} output plugin, after decoding the data, like this:

filter {
  ruby {
    code => "
      event.set('pdf_data', Base64.decode64(event.get('json_field_with_b64_data')))
    "
  }
}

output {
  file {
    path => "data/%{my_file_name}.pdf"
    codec => plain {format => "%{pdfdata}"}
  }
}

What I get is a .pdf file with, apparently, the right contents (if I open it with a text editor), but which turns to a blank page when I open it with Acrobat Reader. I have also tried with line codec instead of plain, with the same result.

I've been searching the documentation looking for other output plugins, and other codecs, but I haven't been able to find an alternative. I have not found any similar case here either.

Maybe a ruby filter could do the writing instead? Has anyone done something similar? Is it even possible to write a pdf file from logstash? Any ideas or pointers will be appreciated.

Thanks in advance

Hi,

I'm still struggling with this. I'd really appreciate it if someone could provide some help.

Thank you

Hi

Can someone help me with this? I´m trying to write pdf data to a pdf file from within logstash.

Thank you.

Hi

I solved it. I moved the writing from the output{} to the filter{}, as shown in the code at the end of this: https://dev.to/gadimbaylisahil/turn-base64-images-into-pdf-in-ruby-with-prawn-3o04.

The code, if anyone is interested, looks like this (names changed to protect the innocent):

ruby {
      init => "require 'base64'"
      code => "
        path_pdf = 'data/' + event.get('my_file_name.pdf')
        File.open path_pdf, 'wb' do |file|
          file.write Base64.decode64(event.get('json_field_with_b64_data'))
          file.close
        end
      "
      remove_field => ["json_field_with_b64_data"]
    } 

This writes a .pdf file with the right contents that can be open with Adobe Acrobat Reader with no issues.

Hope this can help someone in the future.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.