I am checking the loadbalancer feature, in my test seems like filebeat stalls after a while.
I have an elk cluster with 4 LS nodes. I have a loop generating msgs to /var/log/messages constantly, and when I start filebeat it captures all new changes at the moment and submits it to LS, but after that happens it basically stalls until I reboot it, even if I take the LS it connected down.
@jhidalgo Config looks good. Which system are you running it on? Can you start filebeat with -d "*" (debug output) option an post some log entries here as soon as filebeat stalls?
I see your config having only one logstash instance configured. In this case, the load balancing mode will not be loaded/initialized. You need to have at least two instances configured. If filebeat stalls with only one host configured issue is not related to load-balancing node. But still, filebeat might stale, if all logstash instances known become unavailable. Have you tried running with -d '*,publish' ? Would be interesting to see some debug/error output.
Plus load balancing in filebeat is at the moment limited due filebeat waiting for ok from logstash in order to update some internal state on disk to deal with filebeat restarts. Still some remains originating from logstash-forward which might be addressed in the future. Related filebeat issue: https://github.com/elastic/filebeat/issues/106
Load Balancing:
I am using a DNS entry that responds with 4 A records. From an administrative point of view it is much better to add and remove IPs at the DNS level than changing configuration files for multiple services. I am assuming FileBeat will consider those multiple IPs, but please let me know if that's not the case.
File changes not detected. I am monitoring /var/log/messages with FB, and creating new entries with logger, but this is what the -d "*,publish" shows:
prospector.go:163: DBG Check file for harvesting: /var/log/messages
prospector.go:280: DBG Update existing file for harvesting: /var/log/messages
prospector.go:322: DBG Not harvesting, file didn't change: /var/log/messages
output.go:37: DBG output worker: no events to publish
output.go:37: DBG output worker: no events to publish
No matter how many lines I append to /var/log/messages, those changes are not detected by FB.
OS:
CentOS Linux release 7.1.1503 (Core)
Derived from Red Hat Enterprise Linux 7.1 (Source)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
Without considering load balancing, when FB starts it finds new entries and submits them to LS, but once all new entries have been found it seems to stall, not detecting or submitting new entries as I would expect considering the configuration settings.
I'm not sure if this is related - happy to start a new thread if not.
I run a pipeline latency test for my ELK cluster; writing timestamp info to a file every second, processing it at each Logstash node to calculate the delay at that node, and graphing the results when they arrive at ES.
I've noticed that filebeat is not always detecting the log file has updated, even with scan_frequency set at 1s. Instead it's deciding the file hasn't changed, and then periodically sends batches of messages to catch up.
As you can imagine this is terrible for a latency test - the result is irregular sawtooth graphs.
Note that the log file is indeed being updated every second. The script writing it is just a simple "echo / sleep 1" type of thing, and I've tailed it during this test to double check.
I'm also sure the scan_frequency setting doing what it says - the "Not harvesting, file didn't change" events occur every second. The problem appears to be that it's not detecting the log file has changed at every interval.
Edit: I should point out that this happens immediately on startup, not after any particular waiting period.
Hi @ceekay, as part of this PR we just merged, a configurable parameter was introduced to control how often we check an open file for new writes. It's called backoff and the default is 1s.
This change has sped up significantly our automated tests, so I have the feeling it will also help with the issue you are seeing. We'll have the option in RC1 that we are currently working on releasing, but I can also trigger a nightly build if you want to test before that.
Thanks for the updates. This sounds like it will resolve my issue - I'll check and let you know.
I understand the overhead increase for a very small spooler, but this case is specifically for pipeline latency monitoring and is quite low volume - one message per second, on one host.
@steffens@ruflin I'm seeing the same behaviour as @jhidalgo - filebeat works as expected for a while, but then suddenly stops detecting log changes. After that, it continues to scan the files, but decides: "Not harvesting, file didn't change".
I'm using current head from the filebeat repo, because I needed the force_close_on_windows_files option. OS is Windows Server 2012 R2.
Setting spool_size to 1 immediately fixed my issue. Max latency time at the first node has reduced from ~8 seconds to 1 second, and min latency is ~0.05 secs, which is about the same as I was seeing before testing filebeat.
With spool_size at 1, I don't see much point in changing idle_timeout, as the output will fire as soon as an event is available.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.