File Input Plugin Efficiency

CDR · October 10, 2017, 1:23pm

In my logstash.conf file I am utilizing the file input plugin.

	file{
	path => "${PWD}/data/**/admin_ui*.csv"
	start_position => beginning
	ignore_older => 0
	sincedb_path => "${PWD}/software_files/logstash-5.6.1/logstash_use/null"
	type => "csv"
}
file{																		
	path => ["${PWD}/data/**/alarm.prn", "${PWD}/data/**/alarm[0-9]*.prn"]
	start_position => beginning
	sincedb_path => "${PWD}/software_files/logstash-5.6.1/logstash_use/null"
	ignore_older => 0
    type => "alarmprn"
	codec => plain{charset => "ANSI_X3.4-1968"}
}
file{
	path => "${PWD}/data/**/alarm_buffer*.prn"
	start_position => beginning
	sincedb_path => "${PWD}/software_files/logstash-5.6.1/logstash_use/null"
	ignore_older => 0
	type => "prn"
}	
file{
	path => "${PWD}/data/**/alrm_server*.prn"
	start_position => beginning
	sincedb_path => "${PWD}/software_files/logstash-5.6.1/logstash_use/null"
	ignore_older => 0
	type => "prn"
}

The above is just a sample of the input section of my logstash.conf file. I have about 78 file input calls right now, only denoting 5 different types.

I was wondering if Logstash would be more efficient if I congregated all of the prn, csv, etc. types into a single file input call and therefore only made the file input call 5 times (one for each type)?

theuntergeek · October 10, 2017, 1:40pm

You'd do far better to use filebeat instead. It's lighter weight and will do better with tracking multiple files. A single instance of filebeat can have multiple prospectors, and each prospector can track multiple files, presumably of similar layout. Filebeat can also apply arbitrary labels and tags to files coming from a given prospector so you can act on them differently within Logstash.

CDR · October 10, 2017, 1:45pm

I do not want to install a service. My usage of the ELK stack requires me to be able to have a zipped folder containing all components of ELK. The user will grab the zip folder, unzip it, run 2 batch scripts starting the correct components, do the analyzing and stop the 2 batch scripts when done. The folder can then be deleted from the computer to remove all traces of ELK.

Because of this I do not think I can use filebeat since it requires it to be installed on the host computer, unless I am misreading the documentation?

theuntergeek · October 10, 2017, 1:58pm

We no longer refer to it as ELK, because Beats are very much a part of the stack. We call it the Elastic Stack now.

You can indeed run filebeat from a locally installed folder. There is nothing preventing you from doing so.

CDR · October 10, 2017, 10:30pm

How would I do this? The documentation here only discusses installing it with Powershell (for Windows).

theuntergeek · October 10, 2017, 10:43pm

But the download page has Win32 and Win64 downloads, which are Zip files.

CDR · October 10, 2017, 10:47pm

Yes. I have downloaded and unzipped it to a folder I want to run it from. How do I run filebeat without installing it as a service? There aren't clear instructions in the documentation. Do I simply run the exe file?

theuntergeek · October 10, 2017, 11:04pm

That is correct. There are some flags you will doubtless have to add/configure/set, but that's it.

CDR · October 13, 2017, 2:13pm

After doing some research and playing with Filebeat I found that it cannot recursively search through all sub directories of a given directory. According to the documentation:

Currently it is not possible to recursively fetch all files in all subdirectories of a directory.

My use case requires that I have this feature. The file input in Logstash allows me to have this. Is this a release they are looking at adding in the future?

This also brings me back to my original question:

I was wondering if Logstash would be more efficient if I congregated all of the prn, csv, etc. types into a single file input call and therefore only made the file input call 5 times (one for each type)?

Thanks!

theuntergeek · October 13, 2017, 3:20pm

Not necessarily. I'm not 100% certain if the file input is multithreaded. If it is, then it won't matter much. If it isn't, then having multiple file inputs will be useful to speed up reads.

I highly recommend separating them out as much as possible anyway, simply for the purpose of specifying separate sincedb locations, otherwise there could be all kinds of traffic to a single sincedb, which can slow things down due to file locking, etc.

CDR · October 13, 2017, 4:14pm

I currently have all 78 file inputs writing to a single sincedb file. Would you recommend each file input gets its own sincedb file? Or is it alright if a number of them share the same one?

theuntergeek · October 13, 2017, 4:30pm

Each file input block should have its own sincedb file.

system · November 10, 2017, 4:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multiple inputs or single input in logstash config, Which is better? Logstash	1	589	March 14, 2017
How exactly logstash-file-input plugin works? Logstash	14	5713	February 13, 2018
Collect log from file via Filebeat to Logstash VS collect from logstash directly Beats filebeat	4	282	September 29, 2022
How to use multiple csv files in logstash and filebeat Logstash	11	1675	June 10, 2020
What are the use cases for the file plugin v.s. logstash? Logstash	13	2181	July 6, 2017

File Input Plugin Efficiency

Related topics