How to process multiple files in logstash

(Niraj Kumar) #1

I have some million json files that i separated in multiple sub folders. Basically these are Cloudtrail data for a year. My Logstash version is 5.5 and this is a 8 core , 32 GB system. The problem is that when i run my logstash that uses a file input plugin and outputting to elasticsearch. It runs for couple of time and then dies with java heap space. Can someone please help me on this. I am running out of ideas now.


(Magnus Bäck) #2

The file input isn't built to process filename patterns that expand to millions of files. You'll have to process them in smaller numbers, e.g. by writing a small script that reads the millions of files and copies the data to a small(er) set of files that you point Logstash to. Another option could be to send the file to Logstash over a socket or a broker. A broker like RabbitMQ will help you with backpressure if Logstash isn't able to consume the messages fast enough.

(Niraj Kumar) #3

Thanks @magnusbaeck.

Can you recommend a solution where i have *.gz files coming in from amazon cloudtrail and those have json files in it and these json files doesn't have a new line in it. Is there a way i can ingest files without having to process of unpacking the zip and adding new line to every json present. I am having a hard time processing these data.


(Magnus Bäck) #4

If these files are among the million files you'll probably have to process them outside of Logstash anyway so I don't know if it's such a big problem. Not sure what you mean by "doesn't have a new line in it". Are the files lacking a trailing newline or what?

(Niraj Kumar) #5

@magnusbaeck Yes the json files doesn't have an EOL in it.

I have to use echo >> filename.json to add an end of line to it and it works after that.

(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.