How to process multiple files in logstash

I have some million json files that i separated in multiple sub folders. Basically these are Cloudtrail data for a year. My Logstash version is 5.5 and this is a 8 core , 32 GB system. The problem is that when i run my logstash that uses a file input plugin and outputting to elasticsearch. It runs for couple of time and then dies with java heap space. Can someone please help me on this. I am running out of ideas now.

--
Niraj

The file input isn't built to process filename patterns that expand to millions of files. You'll have to process them in smaller numbers, e.g. by writing a small script that reads the millions of files and copies the data to a small(er) set of files that you point Logstash to. Another option could be to send the file to Logstash over a socket or a broker. A broker like RabbitMQ will help you with backpressure if Logstash isn't able to consume the messages fast enough.

1 Like

Thanks @magnusbaeck.

Can you recommend a solution where i have *.gz files coming in from amazon cloudtrail and those have json files in it and these json files doesn't have a new line in it. Is there a way i can ingest files without having to process of unpacking the zip and adding new line to every json present. I am having a hard time processing these data.

--
Niraj

If these files are among the million files you'll probably have to process them outside of Logstash anyway so I don't know if it's such a big problem. Not sure what you mean by "doesn't have a new line in it". Are the files lacking a trailing newline or what?

@magnusbaeck Yes the json files doesn't have an EOL in it.

I have to use echo >> filename.json to add an end of line to it and it works after that.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.