Feed Logstash with gzipped multiline inputs

Hi All,

I'm implementing an ELK stack for Java logs indexing and analysis. At the moment, it's only a proof of concept and I cannot feed Logstash with plain text log files; I've got gzipped past log files and I want to index their content.

Given that I cannot use multiple codecs (gzip_lines and multiline), which is the best solution to index them? Should I aggregate lines and then feed Logstash with the result? Or are there other ways to reach my goal?

I've written a Python script to read lines from GZIP files and feed Logstash via http_input plugin but I suppose it is not the best solution (according to the long times needed to index files).

Thank you in advance for suggestions.

I suggest that you use your python script to unzip to another folder and have filebeat read the unzipped files and send them to the beats input. Filebeat has support for multiline.

Thank you very much for the suggestion.

Meanwhile I've implemented a Java program to extract log entries —taking account of multiline entries too— and push them into a Redis list. Then Logstash takes such entries from Redis and index them into Logstash.

It works but I've not measured performances yet.

In this manner I know when all lines of the GZIP file are processes. In the solution you suggested is there a way to know when the file is processed? The goal is to remove the uncompressed file as soon as its content is loaded somewhere and ready to be indexed.

Your solution is fine.

If you need more performance you can push alternate file contents to two redis instances and use two LS instances to read from each redis and output to the same ES index.

I will try with both one and two Redis instances to evaluate performances. If the topic will be still open I'll post here the benchmark results.

Thank you for your help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.