Logstash File input Observation

PKD · December 14, 2015, 3:40pm

Hi all,

We have configured logstash with file input to scan files from specified directory and index into elastic search. We have provided the following path for file input : /var/log/varnish/varnish-access*.log

Varnish generates log files for every hour.

We have changed the stat_interval for file input for 1 hr.

stat_interval => 3600

rest all parameters are by default.

Our observation is that when we set stat_interval to 1Hr the files there is a lag between the ingestion. i.e. files earlier than 6 Hrs are getting scanned. We checked with the lsof command so only the files earlier than 5-6 Hrs are opened by logstash. even if we set the stat_interval to 1Hr.

Is there any delay in the scanning (discovering the new files) and processing if we set stat_interval to higher value ?

Also we tried to change stat_interval to 1 min then the logstash was able to pickup the files faster.

Is there any relation between the stat_interval, since_db_write interval ?

magnusbaeck · December 14, 2015, 6:42pm

Also we tried to change stat_interval to 1 min then the logstash was able to pickup the files faster.

Yes, of course. The whole purpose of the stat_interval option is to select how often Logstash checks if the files have been updated.

If you can explain why you're exploring absurdly long stat_interval values maybe we can provide better answers.

Is there any relation between the stat_interval, since_db_write interval ?

Well, there is a relation in the sense that as long as Logstash can keep up with the data being written to the monitored files the sincedb file will never be updated more frequently than stat_interval.

PKD · December 15, 2015, 5:46am

The stat_interval will check only the updated files. But has it any relation with discovering new files late even if we keep the longer stat_interval ? Even after 5-6 hours it is able to discover new files, even we are using the default value for discover_interval i.e. 15 (option to discover new files to watch)
As new log files are generated every hour so we have kept it to 1 Hr. Also to reduce the stat for every file.

wiibaa · December 15, 2015, 5:55am

Sorry to respond with source code to your question, but to understand precisely the relationship between the two configuration, you need to read the main loop that drives the input file

github.com

jordansissel/ruby-filewatch/blob/master/lib/filewatch/watch.rb#L141-L155


    @logger.error("each: closed?: #{path}: (#{e.inspect})")
  end
end
return if quit?


# look at the ignored to see if its changed
watched_files.select {|wf| wf.ignored? }.each do |watched_file|
  path = watched_file.path
  break if quit?
  begin
    stat = watched_file.restat
    if watched_file.size_changed? || watched_file.inode_changed?(inode(path,stat))
      # if the ignored file changed, move it to the watched state
      # not to active state because we want to use MAX_OPEN_FILES throttling.
      # this file has not been yielded to the block yet

And the response is yes! there is a relationship between the two the concrete discover interval is stat_interval * discover_interval.
For sure there is room for improvment, but clarification in the documentation should come first.
Thanks for reminding me about this old problem

PKD · December 15, 2015, 9:51am

Thanks a lot for the reply. This was very helpful.

So this mean that If we have
stat_interval => 3600
discover_interval => 15 (default for file input)

So logstash will not discover any files for 15 Hrs.

Topic		Replies	Views
How can i schedule logstash every second for file input plugin? Logstash	5	2779	February 27, 2020
Logstash file plugin schedule Logstash	6	1639	July 25, 2018
Wait for file input plugin 90 seconds Logstash	4	819	March 10, 2020
[sincedb creation] Logstash	33	4011	July 6, 2017
INPUT read file after x seconds Logstash	4	890	July 25, 2018

Logstash File input Observation

Related topics