"Check file" timer and "send events" delay

trabakoulas · September 11, 2017, 1:22pm

Hello,

I have two general questions. I'm using the Filebeat to send logs to Logstash. I notice that when I change a file (that is monitored by the Filebeat), the event is published with a delay of ~5-6 seconds. Is this normal? Is this something configurable?

Also, I notice that Filebeat checks the file for any change every 10 seconds:

2017/09/11 13:02:39.322598 prospector_log.go:226: DBG Check file for harvesting: /G/debug.LOG
2017/09/11 13:02:39.322617 prospector_log.go:259: DBG Update existing file for harvesting: /G/debug.LOG, offset: 0
2017/09/11 13:02:39.322629 prospector_log.go:313: DBG File didn't change: /G/debug.LOG
2017/09/11 13:02:39.322644 prospector_log.go:91: DBG Prospector states cleaned up. Before: 2, After: 2
2017/09/11 13:02:39.322884 spooler.go:89: DBG Flushing spooler because of timeout. Events flushed: 0
2017/09/11 13:02:44.323022 spooler.go:89: DBG Flushing spooler because of timeout. Events flushed: 0
2017/09/11 13:02:49.322793 prospector.go:183: DBG Run prospector
2017/09/11 13:02:49.322818 prospector_log.go:70: DBG Start next scan
2017/09/11 13:02:49.322850 prospector_log.go:226: DBG Check file for harvesting: /G/debug.LOG

Can I change this period somehow? For example, to check for any change every 2 second. I searched the yml configuration file but couldn't find any relative timer.

I use version 5.5.2 of Filebeat in Linux Ubuntu.

Thanks in advance,
Giannis

andrewkroh · September 11, 2017, 1:55pm

Check out the documentation for the backoff settings. https://www.elastic.co/guide/en/beats/filebeat/current/configuration-filebeat-options.html#_backoff

trabakoulas · September 11, 2017, 2:13pm

Thanks for the quick reply. Before I post the question, I tried to change these values with no result. For example:

// Backoff values define how aggressively filebeat crawls new files for updates
// The default values can be used in most cases. Backoff defines how long it is waited
//to check a file again after EOF is reached. Default is 1s which means the file
// is checked every second if new lines were added. This leads to a near real time crawling.
//Every time a new line appears, backoff is reset to the initial value.
backoff: 15s

//Max backoff defines what the maximum backoff time is. After having backed off multiple times
//from checking the files, the waiting time will never exceed max_backoff independent of the
//backoff factor. Having it set to 10s means in the worst case a new line can be added to a log
//file after having backed off multiple times, it takes a maximum of 10s to read the new line
max_backoff: 20s

Eventhough, the backoff timer is set to 15 seconds the Filebeat checks the file in every 10 seconds. It ignores the setting. Also, what about the delay in submitting the events that I mentioned? I assume that the backoff settings are not related.

Thanks,
Giannis

andrewkroh · September 11, 2017, 2:48pm

Those settings apply on per prospector level so double check your indentation.

filebeat.prospectors:
- paths: ['/logs/mylog']
  backoff: 500ms
  backoff_factor: 1 # Disable

There are other settings that influence how quickly the event is delivered to Elasticsearch, namely the flush_interval and the bulk_max_size.

And then once the event is received by Elasticsearch it won't be searchable for up to 1s (the default refresh interval). https://www.elastic.co/guide/en/elasticsearch/guide/2.x/near-real-time.html#refresh-api

trabakoulas · September 12, 2017, 7:34am

These are my settings in the yml file:

//Paths that should be crawled and fetched. Glob based paths.
paths:
- /Giannis/debug.LOG

.....

// Backoff values define how aggressively filebeat crawls new files for updates
// The default values can be used in most cases. Backoff defines how long it is waited
//to check a file again after EOF is reached. Default is 1s which means the file
// is checked every second if new lines were added. This leads to a near real time crawling.
//Every time a new line appears, backoff is reset to the initial value.
backoff: 15s

//Max backoff defines what the maximum backoff time is. After having backed off multiple times
//from checking the files, the waiting time will never exceed max_backoff independent of the
//backoff factor. Having it set to 10s means in the worst case a new line can be added to a log
//file after having backed off multiple times, it takes a maximum of 10s to read the new line
max_backoff: 20s

Nothing weird or strange in my yml file and I have no backoff_factor setting (no change if the backoff_factor is set to 1 as well). There is only one prospector and the beat monitors only one file. I try to keep my test as simple as possible. However, the "check file" period just doesn't change, whatever values I put in the backoff settings.

In conclusion, there is no change in the time period that the beat checks the file, no matter the values of the backoff settings. Do you need any logs in order to investigate?

andrewkroh · September 12, 2017, 12:07pm

Please post the exact config that you are using and format it with the </> button.

Using the configuration I provided above, the log seems to confirms that it is working as expected. Each check is ~500ms apart.

2017/09/12 12:04:59.411953 log_file.go:84: DBG  End of file reached: /Users/akroh/Downloads/filebeat-5.5.2-darwin-x86_64/my.log; Backoff now.
2017/09/12 12:04:59.913243 log_file.go:84: DBG  End of file reached: /Users/akroh/Downloads/filebeat-5.5.2-darwin-x86_64/my.log; Backoff now.
2017/09/12 12:05:00.414592 log_file.go:84: DBG  End of file reached: /Users/akroh/Downloads/filebeat-5.5.2-darwin-x86_64/my.log; Backoff now.
2017/09/12 12:05:00.915991 log_file.go:84: DBG  End of file reached: /Users/akroh/Downloads/filebeat-5.5.2-darwin-x86_64/my.log; Backoff now

trabakoulas · September 12, 2017, 12:25pm

This is my configuration file:

#
#################### Filebeat Configuration Example #########################

// This file is an example configuration file highlighting only the most common
/ options. The filebeat.full.yml file from the same directory contains all the
// supported options with more comments. You can use it as a reference.
#
//# You can find the full configuration reference here:
// https://www.elastic.co/guide/en/beats/filebeat/index.html

#=========================== Filebeat prospectors =============================

filebeat.prospectors:

// Each - is a prospector. Most options can be set at the prospector level, so
// you can use different prospectors for various configurations.
// Below are the prospector specific configurations.

- input_type: log
 // json.keys_under_root: true

  // Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /Giannis/debug.LOG

    #- c:\programdata\elasticsearch\logs\*

  // Exclude lines. A list of regular expressions to match. It drops the lines that are
 // matching any regular expression from the list.
  //exclude_lines: ["^DBG"]

  // Include lines. A list of regular expressions to match. It exports the lines that are
  // matching any regular expression from the list.
 // include_lines: ["^ERR", "^WARN"]

 //Exclude files. A list of regular expressions to match. Filebeat drops the files that
 // are matching any regular expression from the list. By default, no files are dropped.
  //exclude_files: [".gz$"]

  // Optional additional fields. These field can be freely picked
  // to add additional information to the crawled log files for filtering
//fields:
//   level: debug
 //  review: 1

// The number of seconds to wait for new events between two producer API calls.
  flush_interval: 1s

//Backoff values define how aggressively filebeat crawls new files for updates
  //The default values can be used in most cases. Backoff defines how long it is waited
 // to check a file again after EOF is reached. Default is 1s which means the file
  //is checked every second if new lines were added. This leads to a near real time crawling.
 //Every time a new line appears, backoff is reset to the initial value.
  backoff: 1s

 // Max backoff defines what the maximum backoff time is. After having backed off multiple times
 // from checking the files, the waiting time will never exceed max_backoff independent of the
  // backoff factor. Having it set to 10s means in the worst case a new line can be added to a log
  // file after having backed off multiple times, it takes a maximum of 10s to read the new line
  max_backoff: 10s

 //backoff_factor: 1


// Multiline options

 // Mutiline can be used for log messages spanning multiple lines. This is common
  //for Java Stack Traces or C-Line Continuation

  // The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
 // multiline.pattern: ^\[
 // multiline.negate: true
 // multiline.match: after

 // Defines if the pattern set under pattern should be negated or not. Default is false.
 //multiline.negate: false

  // Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
 // that was (not) matched before or after or as long as a pattern is not matched based on negate.
  // Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  //multiline.match: after

// Type to be published in the 'type' field. For Elasticsearch output,
  // the type defines the document type these entries should be stored
 // in. Default: log
  document_type: log

//================================ General =====================================

// The name of the shipper that publishes the network data. It can be used to group
// all the transactions sent by a single shipper in the web interface.
name: giannis4

//The tags of the shipper are included in their own field with each
// transaction published.
//tags: ["service-X", "web-tier"]

// Optional fields that you can specify to add additional information to the
// output.
//fields:
//  env: staging

//================================ Outputs =====================================

// Configure what outputs to use when sending the data collected by the beat.
// Multiple outputs may be used.

#-------------------------- Elasticsearch output ------------------------------
//output.elasticsearch:
  // Array of hosts to connect to.
 //hosts: ["http://es-cluster.lotos10.dev:80"]

 // Optional protocol and basic auth credentials.
  //protocol: "https"
  //username: "elastic"
  //password: "changeme"

//----------------------------- Logstash output --------------------------------
output.logstash:
// The Logstash hosts
  hosts: ["10.70.130.121:5044"]

 //Optional SSL. By default is off.
 // List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

// Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  // Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

#================================ Logging =====================================

/ Sets log level. The default log level is info.
/ Available log levels are: critical, error, warning, info, debug
logging.level: debug

/logging.to_files: true
/logging.to_syslog: false
//logging.files:
//	path: /Giannis
//	name: filebeatGiannis.log
//	keepfiles: 7

//At debug level, you can selectively enable logging only for some components.
// To enable all selectors use ["*"]. Examples of other selectors are "beat",
/"publish", "service".
#logging.selectors: ["*"]

trabakoulas · September 12, 2017, 12:25pm

And these are the logs:
2017/09/12 12:15:28.664454 prospector_log.go:70: DBG Start next scan
2017/09/12 12:15:28.664497 prospector_log.go:226: DBG Check file for harvesting: /Giannis/debug.LOG
2017/09/12 12:15:28.664517 prospector_log.go:259: DBG Update existing file for harvesting: /Giannis/debug.LOG, offset: 304
2017/09/12 12:15:28.664532 prospector_log.go:311: DBG Harvester for file is still running: /Giannis/debug.LOG
2017/09/12 12:15:28.664539 prospector_log.go:91: DBG Prospector states cleaned up. Before: 2, After: 2
2017/09/12 12:15:28.664651 spooler.go:89: DBG Flushing spooler because of timeout. Events flushed: 0
2017/09/12 12:15:33.664743 spooler.go:89: DBG Flushing spooler because of timeout. Events flushed: 0
2017/09/12 12:15:33.665109 log_file.go:84: DBG End of file reached: /Giannis/debug.LOG; Backoff now.
2017/09/12 12:15:38.664720 prospector.go:183: DBG Run prospector
2017/09/12 12:15:38.664738 prospector_log.go:70: DBG Start next scan
2017/09/12 12:15:38.664783 prospector_log.go:226: DBG Check file for harvesting: /Giannis/debug.LOG
2017/09/12 12:15:38.664802 prospector_log.go:259: DBG Update existing file for harvesting: /Giannis/debug.LOG, offset: 304
2017/09/12 12:15:38.664817 prospector_log.go:311: DBG Harvester for file is still running: /Giannis/debug.LOG
2017/09/12 12:15:38.664824 prospector_log.go:91: DBG Prospector states cleaned up. Before: 2, After: 2
2017/09/12 12:15:38.664862 spooler.go:89: DBG Flushing spooler because of timeout. Events flushed: 0
2017/09/12 12:15:43.665032 spooler.go:89: DBG Flushing spooler because of timeout. Events flushed: 0
2017/09/12 12:15:43.665263 log_file.go:84: DBG End of file reached: /Giannis/debug.LOG; Backoff now.
2017/09/12 12:15:48.664984 prospector.go:183: DBG Run prospector
2017/09/12 12:15:48.665011 prospector_log.go:70: DBG Start next scan
2017/09/12 12:15:48.665044 prospector_log.go:226: DBG Check file for harvesting: /Giannis/debug.LOG
2017/09/12 12:15:48.665059 prospector_log.go:259: DBG Update existing file for harvesting: /Giannis/debug.LOG, offset: 304
2017/09/12 12:15:48.665068 prospector_log.go:311: DBG Harvester for file is still running: /Giannis/debug.LOG
2017/09/12 12:15:48.665080 prospector_log.go:91: DBG Prospector states cleaned up. Before: 2, After: 2
2017/09/12 12:15:48.665135 spooler.go:89: DBG Flushing spooler because of timeout. Events flushed: 0
2017/09/12 12:15:53.665337 spooler.go:89: DBG Flushing spooler because of timeout. Events flushed: 0
2017/09/12 12:15:53.665366 log_file.go:84: DBG End of file reached: /Giannis/debug.LOG; Backoff now.
2017/09/12 12:15:58.662405 metrics.go:34: INFO No non-zero metrics in the last 30s
2017/09/12 12:15:58.665220 prospector.go:183: DBG Run prospector
2017/09/12 12:15:58.665237 prospector_log.go:70: DBG Start next scan
2017/09/12 12:15:58.665283 prospector_log.go:226: DBG Check file for harvesting: /Giannis/debug.LOG
2017/09/12 12:15:58.665306 prospector_log.go:259: DBG Update existing file for harvesting: /Giannis/debug.LOG, offset: 304
2017/09/12 12:15:58.665319 prospector_log.go:311: DBG Harvester for file is still running: /Giannis/debug.LOG
2017/09/12 12:15:58.665336 prospector_log.go:91: DBG Prospector states cleaned up. Before: 2, After: 2
2017/09/12 12:15:58.665481 spooler.go:89: DBG Flushing spooler because of timeout. Events flushed: 0
2017/09/12 12:16:03.665555 log_file.go:84: DBG End of file reached: /Giannis/debug.LOG; Backoff now.
2017/09/12 12:16:03.665565 spooler.go:89: DBG Flushing spooler because of timeout. Events flushed: 0
2017/09/12 12:16:08.665490 prospector.go:183: DBG Run prospector

andrewkroh · September 12, 2017, 12:41pm

The only valid way to start a comment in YAML is # so I'm confused how the // could ever work (Filebeat will not start with this in your config file). But looking past this your config is basically:

filebeat.prospectors:
- input_type: log 
  paths:
    - /Giannis/debug.LOG
  flush_interval: 1s # Wrong place for this.
  backoff: 1s # Default value is 1s.
  max_backoff: 10s # Default value is 10s.
  document_type: log 

output.logstash:
  hosts: ["10.70.130.121:5044"]

You haven't changed any of the backoff settings so I would expect the defaults to apply. With those defaults you should see a max of 10s between checks to the file.

I see you have a flush_interval option defined under the prospector config. That is an option that applies to the output (like output.logstash.flush_interval).

trabakoulas · September 12, 2017, 1:13pm

The "//" were inserted by mistake. So, according to you, if I want to check the file every 2 secs or every 20 secs, I should put in the max_backoff setting to 2 or 20 respectively? Is this true?

andrewkroh · September 12, 2017, 2:38pm

max_backoff sets the upper bound of the polling interval. The polling interval is increased using an exponential backoff when the end of file (EOF) is reached. When Filebeat receives the first EOF it will wait the backoff interval before trying again. After receiving a second consecutive EOF it will increase the waiting period based on the backoff value and the backoff_factor. When the waiting interval reaches the max_backoff value it will stop increasing (aka truncated exponential backoff).

If you want to check the file for new content consistently every X seconds then disable backoff using backoff_factor: 1 and set backoff: Xs.

system · October 10, 2017, 2:39pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat hangs when logstash dies Beats	4	2322	July 5, 2017
Calculating delay of sending/processing delay of events Beats	4	1208	July 5, 2017
Filebeat and busy files Beats filebeat	17	4000	August 8, 2016
Filebeat stops harvesting - file didn't change Beats filebeat	6	3720	July 5, 2017
Filebeats to logstash issues Beats filebeat	6	1752	April 1, 2018

"Check file" timer and "send events" delay

Related topics