I have two general questions. I'm using the Filebeat to send logs to Logstash. I notice that when I change a file (that is monitored by the Filebeat), the event is published with a delay of ~5-6 seconds. Is this normal? Is this something configurable?
Also, I notice that Filebeat checks the file for any change every 10 seconds:
2017/09/11 13:02:39.322598 prospector_log.go:226: DBG Check file for harvesting: /G/debug.LOG
2017/09/11 13:02:39.322617 prospector_log.go:259: DBG Update existing file for harvesting: /G/debug.LOG, offset: 0
2017/09/11 13:02:39.322629 prospector_log.go:313: DBG File didn't change: /G/debug.LOG
2017/09/11 13:02:39.322644 prospector_log.go:91: DBG Prospector states cleaned up. Before: 2, After: 2
2017/09/11 13:02:39.322884 spooler.go:89: DBG Flushing spooler because of timeout. Events flushed: 0
2017/09/11 13:02:44.323022 spooler.go:89: DBG Flushing spooler because of timeout. Events flushed: 0
2017/09/11 13:02:49.322793 prospector.go:183: DBG Run prospector
2017/09/11 13:02:49.322818 prospector_log.go:70: DBG Start next scan
2017/09/11 13:02:49.322850 prospector_log.go:226: DBG Check file for harvesting: /G/debug.LOG
Can I change this period somehow? For example, to check for any change every 2 second. I searched the yml configuration file but couldn't find any relative timer.
Thanks for the quick reply. Before I post the question, I tried to change these values with no result. For example:
// Backoff values define how aggressively filebeat crawls new files for updates
// The default values can be used in most cases. Backoff defines how long it is waited
//to check a file again after EOF is reached. Default is 1s which means the file
// is checked every second if new lines were added. This leads to a near real time crawling.
//Every time a new line appears, backoff is reset to the initial value.
backoff: 15s
//Max backoff defines what the maximum backoff time is. After having backed off multiple times
//from checking the files, the waiting time will never exceed max_backoff independent of the
//backoff factor. Having it set to 10s means in the worst case a new line can be added to a log
//file after having backed off multiple times, it takes a maximum of 10s to read the new line
max_backoff: 20s
Eventhough, the backoff timer is set to 15 seconds the Filebeat checks the file in every 10 seconds. It ignores the setting. Also, what about the delay in submitting the events that I mentioned? I assume that the backoff settings are not related.
//Paths that should be crawled and fetched. Glob based paths.
paths:
- /Giannis/debug.LOG
.....
// Backoff values define how aggressively filebeat crawls new files for updates
// The default values can be used in most cases. Backoff defines how long it is waited
//to check a file again after EOF is reached. Default is 1s which means the file
// is checked every second if new lines were added. This leads to a near real time crawling.
//Every time a new line appears, backoff is reset to the initial value.
backoff: 15s
//Max backoff defines what the maximum backoff time is. After having backed off multiple times
//from checking the files, the waiting time will never exceed max_backoff independent of the
//backoff factor. Having it set to 10s means in the worst case a new line can be added to a log
//file after having backed off multiple times, it takes a maximum of 10s to read the new line
max_backoff: 20s
Nothing weird or strange in my yml file and I have no backoff_factor setting (no change if the backoff_factor is set to 1 as well). There is only one prospector and the beat monitors only one file. I try to keep my test as simple as possible. However, the "check file" period just doesn't change, whatever values I put in the backoff settings.
In conclusion, there is no change in the time period that the beat checks the file, no matter the values of the backoff settings. Do you need any logs in order to investigate?
Please post the exact config that you are using and format it with the </> button.
Using the configuration I provided above, the log seems to confirms that it is working as expected. Each check is ~500ms apart.
2017/09/12 12:04:59.411953 log_file.go:84: DBG End of file reached: /Users/akroh/Downloads/filebeat-5.5.2-darwin-x86_64/my.log; Backoff now.
2017/09/12 12:04:59.913243 log_file.go:84: DBG End of file reached: /Users/akroh/Downloads/filebeat-5.5.2-darwin-x86_64/my.log; Backoff now.
2017/09/12 12:05:00.414592 log_file.go:84: DBG End of file reached: /Users/akroh/Downloads/filebeat-5.5.2-darwin-x86_64/my.log; Backoff now.
2017/09/12 12:05:00.915991 log_file.go:84: DBG End of file reached: /Users/akroh/Downloads/filebeat-5.5.2-darwin-x86_64/my.log; Backoff now
#
#################### Filebeat Configuration Example #########################
// This file is an example configuration file highlighting only the most common
/ options. The filebeat.full.yml file from the same directory contains all the
// supported options with more comments. You can use it as a reference.
#
//# You can find the full configuration reference here:
// https://www.elastic.co/guide/en/beats/filebeat/index.html
#=========================== Filebeat prospectors =============================
filebeat.prospectors:
// Each - is a prospector. Most options can be set at the prospector level, so
// you can use different prospectors for various configurations.
// Below are the prospector specific configurations.
- input_type: log
// json.keys_under_root: true
// Paths that should be crawled and fetched. Glob based paths.
paths:
- /Giannis/debug.LOG
#- c:\programdata\elasticsearch\logs\*
// Exclude lines. A list of regular expressions to match. It drops the lines that are
// matching any regular expression from the list.
//exclude_lines: ["^DBG"]
// Include lines. A list of regular expressions to match. It exports the lines that are
// matching any regular expression from the list.
// include_lines: ["^ERR", "^WARN"]
//Exclude files. A list of regular expressions to match. Filebeat drops the files that
// are matching any regular expression from the list. By default, no files are dropped.
//exclude_files: [".gz$"]
// Optional additional fields. These field can be freely picked
// to add additional information to the crawled log files for filtering
//fields:
// level: debug
// review: 1
// The number of seconds to wait for new events between two producer API calls.
flush_interval: 1s
//Backoff values define how aggressively filebeat crawls new files for updates
//The default values can be used in most cases. Backoff defines how long it is waited
// to check a file again after EOF is reached. Default is 1s which means the file
//is checked every second if new lines were added. This leads to a near real time crawling.
//Every time a new line appears, backoff is reset to the initial value.
backoff: 1s
// Max backoff defines what the maximum backoff time is. After having backed off multiple times
// from checking the files, the waiting time will never exceed max_backoff independent of the
// backoff factor. Having it set to 10s means in the worst case a new line can be added to a log
// file after having backed off multiple times, it takes a maximum of 10s to read the new line
max_backoff: 10s
//backoff_factor: 1
// Multiline options
// Mutiline can be used for log messages spanning multiple lines. This is common
//for Java Stack Traces or C-Line Continuation
// The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
// multiline.pattern: ^\[
// multiline.negate: true
// multiline.match: after
// Defines if the pattern set under pattern should be negated or not. Default is false.
//multiline.negate: false
// Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
// that was (not) matched before or after or as long as a pattern is not matched based on negate.
// Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
//multiline.match: after
// Type to be published in the 'type' field. For Elasticsearch output,
// the type defines the document type these entries should be stored
// in. Default: log
document_type: log
//================================ General =====================================
// The name of the shipper that publishes the network data. It can be used to group
// all the transactions sent by a single shipper in the web interface.
name: giannis4
//The tags of the shipper are included in their own field with each
// transaction published.
//tags: ["service-X", "web-tier"]
// Optional fields that you can specify to add additional information to the
// output.
//fields:
// env: staging
//================================ Outputs =====================================
// Configure what outputs to use when sending the data collected by the beat.
// Multiple outputs may be used.
#-------------------------- Elasticsearch output ------------------------------
//output.elasticsearch:
// Array of hosts to connect to.
//hosts: ["http://es-cluster.lotos10.dev:80"]
// Optional protocol and basic auth credentials.
//protocol: "https"
//username: "elastic"
//password: "changeme"
//----------------------------- Logstash output --------------------------------
output.logstash:
// The Logstash hosts
hosts: ["10.70.130.121:5044"]
//Optional SSL. By default is off.
// List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
// Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"
// Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"
#================================ Logging =====================================
/ Sets log level. The default log level is info.
/ Available log levels are: critical, error, warning, info, debug
logging.level: debug
/logging.to_files: true
/logging.to_syslog: false
//logging.files:
// path: /Giannis
// name: filebeatGiannis.log
// keepfiles: 7
//At debug level, you can selectively enable logging only for some components.
// To enable all selectors use ["*"]. Examples of other selectors are "beat",
/"publish", "service".
#logging.selectors: ["*"]
The only valid way to start a comment in YAML is # so I'm confused how the // could ever work (Filebeat will not start with this in your config file). But looking past this your config is basically:
filebeat.prospectors:
- input_type: log
paths:
- /Giannis/debug.LOG
flush_interval: 1s # Wrong place for this.
backoff: 1s # Default value is 1s.
max_backoff: 10s # Default value is 10s.
document_type: log
output.logstash:
hosts: ["10.70.130.121:5044"]
You haven't changed any of the backoff settings so I would expect the defaults to apply. With those defaults you should see a max of 10s between checks to the file.
I see you have a flush_interval option defined under the prospector config. That is an option that applies to the output (like output.logstash.flush_interval).
The "//" were inserted by mistake. So, according to you, if I want to check the file every 2 secs or every 20 secs, I should put in the max_backoff setting to 2 or 20 respectively? Is this true?
max_backoff sets the upper bound of the polling interval. The polling interval is increased using an exponential backoff when the end of file (EOF) is reached. When Filebeat receives the first EOF it will wait the backoff interval before trying again. After receiving a second consecutive EOF it will increase the waiting period based on the backoff value and the backoff_factor. When the waiting interval reaches the max_backoff value it will stop increasing (aka truncated exponential backoff).
If you want to check the file for new content consistently every X seconds then disable backoff using backoff_factor: 1 and set backoff: Xs.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.