S3 output - Corrupted gzip file on abnormal shutdown

Hi,

I posted an issue on GitHub but I thought I might as well ask for some help here:

  • Description:

When logstash is shut down abnormally, the s3 files created in the temporary directory are saved, and automatically uploaded on next startup (when the restore option is set to true).

This is usually fine, but when the encoding => "gzip" option is set, the saved gzip file may become corrupted, and sent as such to the s3.

$ zcat ls.s3.18e199e7-6bf9-4a82-ad65-5fbc3d34ccce.2018-08-02T14.06.part3.txt.gz >/dev/null
zcat: ls.s3.18e199e7-6bf9-4a82-ad65-5fbc3d34ccce.2018-08-02T14.06.part3.txt.gz: unexpected end of file
zcat: ls.s3.18e199e7-6bf9-4a82-ad65-5fbc3d34ccce.2018-08-02T14.06.part3.txt.gz: uncompress failed

The files cannot be retrieved via the s3 input plugin, i.e. the following error is thrown:

Error: Unexpected end of ZLIB input stream

And all records from that file/batch are discarded.

This is problematic since it means we cannot rely on persistent queues to ensure no data is lost.

  • Version: 4.1.4
  • Operating System: Docker docker.elastic.co/logstash/logstash:6.3.2
  • Options: encoding => "gzip", restore => "true"
  • Steps to Reproduce:
    • Launch a logstash instance with an input plugin that receives a flow of events.
    • Configure s3 output plugin with options encoding => "gzip" and restore => "true"
    • When a gzip file appears in /tmp/logstash, kill the logstash instance abruptly, e.g. with docker exec -it logstash kill -KILL 1 if you are under docker.
    • Inspect the temporary file in /tmp/logstash, that will be sent via to the s3 on next startup. It will most likely be corrupted, i.e. :
    $ zcat ls.s3.18e199e7-6bf9-4a82-ad65-5fbc3d34ccce.2018-08-02T14.06.part3.txt.gz >/dev/null
    zcat: ls.s3.18e199e7-6bf9-4a82-ad65-5fbc3d34ccce.2018-08-02T14.06.part3.txt.gz: unexpected end of file
    zcat: ls.s3.18e199e7-6bf9-4a82-ad65-5fbc3d34ccce.2018-08-02T14.06.part3.txt.gz: uncompress failed
    

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.