I am checking the loadbalancer feature, in my test seems like filebeat stalls after a while.
I have an elk cluster with 4 LS nodes. I have a loop generating msgs to /var/log/messages constantly, and when I start filebeat it captures all new changes at the moment and submits it to LS, but after that happens it basically stalls until I reboot it, even if I take the LS it connected down.
Am I missing something ?
Here is my configuration:
@jhidalgo Config looks good. Which system are you running it on? Can you start filebeat with -d "*" (debug output) option an post some log entries here as soon as filebeat stalls?
I see your config having only one logstash instance configured. In this case, the load balancing mode will not be loaded/initialized. You need to have at least two instances configured. If filebeat stalls with only one host configured issue is not related to load-balancing node. But still, filebeat might stale, if all logstash instances known become unavailable. Have you tried running with -d '*,publish' ? Would be interesting to see some debug/error output.
Plus load balancing in filebeat is at the moment limited due filebeat waiting for ok from logstash in order to update some internal state on disk to deal with filebeat restarts. Still some remains originating from logstash-forward which might be addressed in the future. Related filebeat issue: https://github.com/elastic/filebeat/issues/106
I am using a DNS entry that responds with 4 A records. From an administrative point of view it is much better to add and remove IPs at the DNS level than changing configuration files for multiple services. I am assuming FileBeat will consider those multiple IPs, but please let me know if that's not the case.
File changes not detected. I am monitoring /var/log/messages with FB, and creating new entries with logger, but this is what the -d "*,publish" shows:
prospector.go:163: DBG Check file for harvesting: /var/log/messages
prospector.go:280: DBG Update existing file for harvesting: /var/log/messages
prospector.go:322: DBG Not harvesting, file didn't change: /var/log/messages
output.go:37: DBG output worker: no events to publish
output.go:37: DBG output worker: no events to publish
No matter how many lines I append to /var/log/messages, those changes are not detected by FB.
CentOS Linux release 7.1.1503 (Core)
Derived from Red Hat Enterprise Linux 7.1 (Source)
Without considering load balancing, when FB starts it finds new entries and submits them to LS, but once all new entries have been found it seems to stall, not detecting or submitting new entries as I would expect considering the configuration settings.
Thanks for your support
filebeat will get list of ips for every logstash host configured and chooses on IP by random retrying if connection fails.
One more question: You using network shares or run in VM with sharing enabled?
AWS EC2 instance without network shares
I'm not sure if this is related - happy to start a new thread if not.
I run a pipeline latency test for my ELK cluster; writing timestamp info to a file every second, processing it at each Logstash node to calculate the delay at that node, and graphing the results when they arrive at ES.
I've noticed that filebeat is not always detecting the log file has updated, even with scan_frequency set at 1s. Instead it's deciding the file hasn't changed, and then periodically sends batches of messages to catch up.
As you can imagine this is terrible for a latency test - the result is irregular sawtooth graphs.
I've pasted a debug output of the issue at http://pastebin.com/uwUuE00k
Note that the log file is indeed being updated every second. The script writing it is just a simple "echo / sleep 1" type of thing, and I've tailed it during this test to double check.
I'm also sure the scan_frequency setting doing what it says - the "Not harvesting, file didn't change" events occur every second. The problem appears to be that it's not detecting the log file has changed at every interval.
Edit: I should point out that this happens immediately on startup, not after any particular waiting period.
Hi @ceekay, as part of this PR we just merged, a configurable parameter was introduced to control how often we check an open file for new writes. It's called
backoff and the default is 1s.
This change has sped up significantly our automated tests, so I have the feeling it will also help with the issue you are seeing. We'll have the option in RC1 that we are currently working on releasing, but I can also trigger a nightly build if you want to test before that.
Edit: actual link to the PR and some wording.
@ceekay filebeat has an internal spooler collecting lines from multiple events before publishing (think collecting batches).
See config options spool_size and idle_timeout to tune the spooler. The smaller the batches the more processing overhead, potentially.
Thanks for the updates. This sounds like it will resolve my issue - I'll check and let you know.
I understand the overhead increase for a very small spooler, but this case is specifically for pipeline latency monitoring and is quite low volume - one message per second, on one host.
@steffens @ruflin I'm seeing the same behaviour as @jhidalgo - filebeat works as expected for a while, but then suddenly stops detecting log changes. After that, it continues to scan the files, but decides: "Not harvesting, file didn't change".
I'm using current head from the filebeat repo, because I needed the force_close_on_windows_files option. OS is Windows Server 2012 R2.
The prospectors are configured like this:
# General filebeat configuration options
Setting spool_size to 1 immediately fixed my issue. Max latency time at the first node has reduced from ~8 seconds to 1 second, and min latency is ~0.05 secs, which is about the same as I was seeing before testing filebeat.
With spool_size at 1, I don't see much point in changing idle_timeout, as the output will fire as soon as an event is available.
Thanks very much!
@lhm @jhidalgo I think I found the same problem. Take a look at this issue.