Processing remote ZIP files with Logstash

arisbanach · February 16, 2018, 8:57pm

I'd just like to verify I am attempting this correctly. I am trying to start a Logstash pipeline that first downloads some large (~500MB) ZIP files via http, then unzips them, then processes that data. I am having trouble with the first part, though.

Should I use http_poller for this?
I see a gzip codec. Should I be using this with .ZIP, or is that not going to work?
I get these logs when trying with http_poller:

eb 16 20:50:28 lamp-01 systemd[1]: Started logstash.
Feb 16 20:50:42 lamp-01 logstash[12860]: Sending Logstash's logs to /var/log/logstash which is now configured via log4j2.properties
Feb 16 20:53:00 lamp-01 logstash[12860]: java.lang.OutOfMemoryError: Java heap space
Feb 16 20:53:00 lamp-01 logstash[12860]: Dumping heap to java_pid12860.hprof ...
Feb 16 20:53:00 lamp-01 logstash[12860]: Unable to create java_pid12860.hprof: Permission denied

Any idea why this is happening? These appear only a few minutes after starting the pipeline, and I allocated 2GB of memory to Logstash.

Badger · February 16, 2018, 9:12pm

If I read it correctly, a Zlib::GzipReader reads the whole object (500 MB) into memory and decompresses it. I am not surprised it blows up a 2 GB heap. To me it does not appear to be a stream reader.

I would experiment with much smaller files and see what the impact on the verbose GC logs looks like. But then I am a GC nerd.

arisbanach · February 16, 2018, 9:14pm

I don't have any codec currently specified though. Just a simple config with http_poller set on a cron and a file output location.

magnusbaeck · February 19, 2018, 7:37am

I don't think there's a reasonable way of doing this with Logstash alone. I suggest you run a script alongside Logstash that downloads the zip files, unpacks them, and hands them over to Logstash.

I see a gzip codec. Should I be using this with .ZIP, or is that not going to work?

It's not going to work. Gzip just compresses a stream of data, zip is an archive file format that also compresses.

system · March 19, 2018, 7:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to efficiently parse log files in logstash? Logstash	3	421	July 16, 2018
Http_poller input Logstash	3	251	May 26, 2020
Does Logstash (7.0.1) support gzip file input? Logstash	7	4849	June 20, 2019
Does logstash support zip format logs from s3 bucket Logstash	8	642	November 15, 2022
Logstash http_poller - fetching big files Logstash	1	365	June 25, 2019

Processing remote ZIP files with Logstash

Related topics