File Input Plugin Efficiency

In my logstash.conf file I am utilizing the file input plugin.

	file{
	path => "${PWD}/data/**/admin_ui*.csv"
	start_position => beginning
	ignore_older => 0
	sincedb_path => "${PWD}/software_files/logstash-5.6.1/logstash_use/null"
	type => "csv"
}
file{																		
	path => ["${PWD}/data/**/alarm.prn", "${PWD}/data/**/alarm[0-9]*.prn"]
	start_position => beginning
	sincedb_path => "${PWD}/software_files/logstash-5.6.1/logstash_use/null"
	ignore_older => 0
    type => "alarmprn"
	codec => plain{charset => "ANSI_X3.4-1968"}
}
file{
	path => "${PWD}/data/**/alarm_buffer*.prn"
	start_position => beginning
	sincedb_path => "${PWD}/software_files/logstash-5.6.1/logstash_use/null"
	ignore_older => 0
	type => "prn"
}	
file{
	path => "${PWD}/data/**/alrm_server*.prn"
	start_position => beginning
	sincedb_path => "${PWD}/software_files/logstash-5.6.1/logstash_use/null"
	ignore_older => 0
	type => "prn"
}

The above is just a sample of the input section of my logstash.conf file. I have about 78 file input calls right now, only denoting 5 different types.

I was wondering if Logstash would be more efficient if I congregated all of the prn, csv, etc. types into a single file input call and therefore only made the file input call 5 times (one for each type)?

You'd do far better to use filebeat instead. It's lighter weight and will do better with tracking multiple files. A single instance of filebeat can have multiple prospectors, and each prospector can track multiple files, presumably of similar layout. Filebeat can also apply arbitrary labels and tags to files coming from a given prospector so you can act on them differently within Logstash.

1 Like

I do not want to install a service. My usage of the ELK stack requires me to be able to have a zipped folder containing all components of ELK. The user will grab the zip folder, unzip it, run 2 batch scripts starting the correct components, do the analyzing and stop the 2 batch scripts when done. The folder can then be deleted from the computer to remove all traces of ELK.

Because of this I do not think I can use filebeat since it requires it to be installed on the host computer, unless I am misreading the documentation?

We no longer refer to it as ELK, because Beats are very much a part of the stack. We call it the Elastic Stack now.

You can indeed run filebeat from a locally installed folder. There is nothing preventing you from doing so.

How would I do this? The documentation here only discusses installing it with Powershell (for Windows).

But the download page has Win32 and Win64 downloads, which are Zip files.

Yes. I have downloaded and unzipped it to a folder I want to run it from. How do I run filebeat without installing it as a service? There aren't clear instructions in the documentation. Do I simply run the exe file?

That is correct. There are some flags you will doubtless have to add/configure/set, but that's it.

After doing some research and playing with Filebeat I found that it cannot recursively search through all sub directories of a given directory. According to the documentation:

Currently it is not possible to recursively fetch all files in all subdirectories of a directory.

My use case requires that I have this feature. The file input in Logstash allows me to have this. Is this a release they are looking at adding in the future?

This also brings me back to my original question:

I was wondering if Logstash would be more efficient if I congregated all of the prn, csv, etc. types into a single file input call and therefore only made the file input call 5 times (one for each type)?

Thanks!

Not necessarily. I'm not 100% certain if the file input is multithreaded. If it is, then it won't matter much. If it isn't, then having multiple file inputs will be useful to speed up reads.

I highly recommend separating them out as much as possible anyway, simply for the purpose of specifying separate sincedb locations, otherwise there could be all kinds of traffic to a single sincedb, which can slow things down due to file locking, etc.

1 Like

I currently have all 78 file inputs writing to a single sincedb file. Would you recommend each file input gets its own sincedb file? Or is it alright if a number of them share the same one?

Each file input block should have its own sincedb file.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.