How to process multiple files in logstash

niraj_kumar · August 15, 2017, 8:34pm

I have some million json files that i separated in multiple sub folders. Basically these are Cloudtrail data for a year. My Logstash version is 5.5 and this is a 8 core , 32 GB system. The problem is that when i run my logstash that uses a file input plugin and outputting to elasticsearch. It runs for couple of time and then dies with java heap space. Can someone please help me on this. I am running out of ideas now.

--
Niraj

magnusbaeck · August 16, 2017, 7:09pm

The file input isn't built to process filename patterns that expand to millions of files. You'll have to process them in smaller numbers, e.g. by writing a small script that reads the millions of files and copies the data to a small(er) set of files that you point Logstash to. Another option could be to send the file to Logstash over a socket or a broker. A broker like RabbitMQ will help you with backpressure if Logstash isn't able to consume the messages fast enough.

niraj_kumar · August 16, 2017, 8:35pm

Thanks @magnusbaeck.

Can you recommend a solution where i have *.gz files coming in from amazon cloudtrail and those have json files in it and these json files doesn't have a new line in it. Is there a way i can ingest files without having to process of unpacking the zip and adding new line to every json present. I am having a hard time processing these data.

--
Niraj

magnusbaeck · August 17, 2017, 5:09am

If these files are among the million files you'll probably have to process them outside of Logstash anyway so I don't know if it's such a big problem. Not sure what you mean by "doesn't have a new line in it". Are the files lacking a trailing newline or what?

niraj_kumar · August 17, 2017, 7:41am

@magnusbaeck Yes the json files doesn't have an EOL in it.

I have to use echo >> filename.json to add an end of line to it and it works after that.

system · September 14, 2017, 7:41am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash don't log more than 500 files Logstash	1	328	October 27, 2018
Loading many (big) json files into elasticsearch Logstash	14	12525	May 23, 2018
Mutliple file input not processed (or may be erased in elasticsearch) Logstash	3	273	June 23, 2020
Ingesting large number of files in a directory using logstash Logstash ingest-pipeline	4	931	April 21, 2021
Logstash TCP input not being processed? Logstash	2	14	October 31, 2024

How to process multiple files in logstash

Related topics