yodog
(Yodog)
July 5, 2021, 4:59pm
1
opened 06:16AM - 27 Feb 18 UTC
We are reading lots of data from Elasticsearch every day and compressing it to a… gzip package for long term archiving. I wrote a pipeline config using the file output and noticed that it has a handy feature to produce a compressed file without a need for logrotate.
Unfortunately, I am experiencing an issue with it. The size of the log file we are producing daily is around 400 - 700 MB uncompressed, and every time I try to uncompress the gzipped package produced by this plugin, gunzip fails with "unexpected end of file".
I can see that it has all the data in it using
`gzip -cd logfile.log.gz | head`
and
`gzip -cd logfile.log.gz | tail`
but it is clearly missing the footer (of the gzip package). I was poking around and found this (http://ruby-doc.org/stdlib-2.4.0/libdoc/zlib/rdoc/Zlib/GzipWriter.html):
> NOTE: Due to the limitation of Ruby's finalizer, you must explicitly close GzipWriter objects by Zlib::GzipFile#close etc. Otherwise, GzipWriter will be not able to write the gzip footer and will generate a broken gzip file.
I checked the source of this output and I didn't see that the GzipWriter objects close method was called anywhere, so I guess this could be the reason it is failing?
I was trying to reproduce the problem with a very small dataset in my own lab environment, but no cigar. But it is failing every day in production environment with real dataset.
With a bit of searching, I was able to found another report of this issue: https://stackoverflow.com/questions/45533870/logstash-gzipped-file-output-results-in-unexpected-end-of-file
so, after a few years, any way to work around this problem?
same problem on logstash 7.13.1
zgrep '2021' zimbra.log.2021-01-30-1613154813.gz
gzip: zimbra.log.2021-01-30-1613154813.gz: invalid compressed data--format violated
root@logstash:# yum info logstash
Installed Packages
Name : logstash
Arch : x86_64
Epoch : 1
Version : 7.13.1
Release : 1
Size : 596 M
Repo : installed
Summary : An extensible logging pipeline
URL : https://www.elastic.co/logstash
License : Elastic License
Description : An extensible logging pipeline
It might help if you provided a reproducible example. When I run logstash with
input { generator { count => 1 lines => [ '' ] } }
filter { }
output { file { path => "/tmp/foo.txt.gz" gzip => true } }
zgrep and gunzip do not complain about the resulting file. If I run with
input { stdin {} }
filter { }
output { file { path => "/tmp/foo.txt.gz" gzip => true } }
then, as you would expect, zgrep will complain about the file whilst logstash is running, but if I kill -TERM logstash then gunzip no longer complains about the unexpected end of file.
system
(system)
Closed
August 2, 2021, 6:50pm
3
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.