The above is just a sample of the input section of my logstash.conf file. I have about 78 file input calls right now, only denoting 5 different types.
I was wondering if Logstash would be more efficient if I congregated all of the prn, csv, etc. types into a single file input call and therefore only made the file input call 5 times (one for each type)?
You'd do far better to use filebeat instead. It's lighter weight and will do better with tracking multiple files. A single instance of filebeat can have multiple prospectors, and each prospector can track multiple files, presumably of similar layout. Filebeat can also apply arbitrary labels and tags to files coming from a given prospector so you can act on them differently within Logstash.
I do not want to install a service. My usage of the ELK stack requires me to be able to have a zipped folder containing all components of ELK. The user will grab the zip folder, unzip it, run 2 batch scripts starting the correct components, do the analyzing and stop the 2 batch scripts when done. The folder can then be deleted from the computer to remove all traces of ELK.
Because of this I do not think I can use filebeat since it requires it to be installed on the host computer, unless I am misreading the documentation?
Yes. I have downloaded and unzipped it to a folder I want to run it from. How do I run filebeat without installing it as a service? There aren't clear instructions in the documentation. Do I simply run the exe file?
After doing some research and playing with Filebeat I found that it cannot recursively search through all sub directories of a given directory. According to the documentation:
Currently it is not possible to recursively fetch all files in all subdirectories of a directory.
My use case requires that I have this feature. The file input in Logstash allows me to have this. Is this a release they are looking at adding in the future?
This also brings me back to my original question:
I was wondering if Logstash would be more efficient if I congregated all of the prn, csv, etc. types into a single file input call and therefore only made the file input call 5 times (one for each type)?
Not necessarily. I'm not 100% certain if the file input is multithreaded. If it is, then it won't matter much. If it isn't, then having multiple file inputs will be useful to speed up reads.
I highly recommend separating them out as much as possible anyway, simply for the purpose of specifying separate sincedb locations, otherwise there could be all kinds of traffic to a single sincedb, which can slow things down due to file locking, etc.
I currently have all 78 file inputs writing to a single sincedb file. Would you recommend each file input gets its own sincedb file? Or is it alright if a number of them share the same one?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.