Logstash File input Observation


(PKD) #1

Hi all,

We have configured logstash with file input to scan files from specified directory and index into elastic search. We have provided the following path for file input : /var/log/varnish/varnish-access*.log

Varnish generates log files for every hour.

We have changed the stat_interval for file input for 1 hr.

stat_interval => 3600

rest all parameters are by default.

Our observation is that when we set stat_interval to 1Hr the files there is a lag between the ingestion. i.e. files earlier than 6 Hrs are getting scanned. We checked with the lsof command so only the files earlier than 5-6 Hrs are opened by logstash. even if we set the stat_interval to 1Hr.

Is there any delay in the scanning (discovering the new files) and processing if we set stat_interval to higher value ?

Also we tried to change stat_interval to 1 min then the logstash was able to pickup the files faster.

Is there any relation between the stat_interval, since_db_write interval ?


(Magnus B├Ąck) #2

Also we tried to change stat_interval to 1 min then the logstash was able to pickup the files faster.

Yes, of course. The whole purpose of the stat_interval option is to select how often Logstash checks if the files have been updated.

If you can explain why you're exploring absurdly long stat_interval values maybe we can provide better answers.

Is there any relation between the stat_interval, since_db_write interval ?

Well, there is a relation in the sense that as long as Logstash can keep up with the data being written to the monitored files the sincedb file will never be updated more frequently than stat_interval.


(PKD) #3

The stat_interval will check only the updated files. But has it any relation with discovering new files late even if we keep the longer stat_interval ? Even after 5-6 hours it is able to discover new files, even we are using the default value for discover_interval i.e. 15 (option to discover new files to watch)
As new log files are generated every hour so we have kept it to 1 Hr. Also to reduce the stat for every file.


#4

Sorry to respond with source code to your question, but to understand precisely the relationship between the two configuration, you need to read the main loop that drives the input file

And the response is yes! there is a relationship between the two the concrete discover interval is stat_interval * discover_interval.
For sure there is room for improvment, but clarification in the documentation should come first.
Thanks for reminding me about this old problem :smiley:


(PKD) #5

Thanks a lot for the reply. This was very helpful.

So this mean that If we have
stat_interval => 3600
discover_interval => 15 (default for file input)

So logstash will not discover any files for 15 Hrs.


(system) #6