File changes not detected after a while

jhidalgo · October 29, 2015, 8:54pm

I am checking the loadbalancer feature, in my test seems like filebeat stalls after a while.
I have an elk cluster with 4 LS nodes. I have a loop generating msgs to /var/log/messages constantly, and when I start filebeat it captures all new changes at the moment and submits it to LS, but after that happens it basically stalls until I reboot it, even if I take the LS it connected down.

Am I missing something ?

Here is my configuration:

filebeat:
spool_size: 5
prospectors:
-
paths:
- "/var/log/*.log"
- /var/log/messages
type: log
scan_frequency: 5s
idle_timeout: 10s

output:
logstash:
enabled: true
hosts:
- logstash.example.com:5044
loadbalance: true

ruflin · October 30, 2015, 7:04am

@jhidalgo Config looks good. Which system are you running it on? Can you start filebeat with -d "*" (debug output) option an post some log entries here as soon as filebeat stalls?

steffens · October 30, 2015, 4:06pm

I see your config having only one logstash instance configured. In this case, the load balancing mode will not be loaded/initialized. You need to have at least two instances configured. If filebeat stalls with only one host configured issue is not related to load-balancing node. But still, filebeat might stale, if all logstash instances known become unavailable. Have you tried running with -d '*,publish' ? Would be interesting to see some debug/error output.

Plus load balancing in filebeat is at the moment limited due filebeat waiting for ok from logstash in order to update some internal state on disk to deal with filebeat restarts. Still some remains originating from logstash-forward which might be addressed in the future. Related filebeat issue: https://github.com/elastic/filebeat/issues/106

jhidalgo · October 30, 2015, 4:20pm

Load Balancing:
I am using a DNS entry that responds with 4 A records. From an administrative point of view it is much better to add and remove IPs at the DNS level than changing configuration files for multiple services. I am assuming FileBeat will consider those multiple IPs, but please let me know if that's not the case.
File changes not detected. I am monitoring /var/log/messages with FB, and creating new entries with logger, but this is what the -d "*,publish" shows:
prospector.go:163: DBG Check file for harvesting: /var/log/messages
prospector.go:280: DBG Update existing file for harvesting: /var/log/messages
prospector.go:322: DBG Not harvesting, file didn't change: /var/log/messages
output.go:37: DBG output worker: no events to publish
output.go:37: DBG output worker: no events to publish

No matter how many lines I append to /var/log/messages, those changes are not detected by FB.

OS:
CentOS Linux release 7.1.1503 (Core)
Derived from Red Hat Enterprise Linux 7.1 (Source)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
Without considering load balancing, when FB starts it finds new entries and submits them to LS, but once all new entries have been found it seems to stall, not detecting or submitting new entries as I would expect considering the configuration settings.

Thanks for your support

steffens · October 30, 2015, 4:40pm

filebeat will get list of ips for every logstash host configured and chooses on IP by random retrying if connection fails.

steffens · October 30, 2015, 4:41pm

One more question: You using network shares or run in VM with sharing enabled?

jhidalgo · October 30, 2015, 5:23pm

AWS EC2 instance without network shares

ruflin · November 3, 2015, 7:43am

@jhidalgo Have a look at https://github.com/elastic/filebeat/issues/167 Do you have some encoding configured?

ceekay · November 3, 2015, 10:06pm

I'm not sure if this is related - happy to start a new thread if not.

I run a pipeline latency test for my ELK cluster; writing timestamp info to a file every second, processing it at each Logstash node to calculate the delay at that node, and graphing the results when they arrive at ES.

I've noticed that filebeat is not always detecting the log file has updated, even with scan_frequency set at 1s. Instead it's deciding the file hasn't changed, and then periodically sends batches of messages to catch up.

As you can imagine this is terrible for a latency test - the result is irregular sawtooth graphs.

I've pasted a debug output of the issue at http://pastebin.com/uwUuE00k

Note that the log file is indeed being updated every second. The script writing it is just a simple "echo / sleep 1" type of thing, and I've tailed it during this test to double check.

I'm also sure the scan_frequency setting doing what it says - the "Not harvesting, file didn't change" events occur every second. The problem appears to be that it's not detecting the log file has changed at every interval.

Edit: I should point out that this happens immediately on startup, not after any particular waiting period.

tudor · November 4, 2015, 10:26am

Hi @ceekay, as part of this PR we just merged, a configurable parameter was introduced to control how often we check an open file for new writes. It's called backoff and the default is 1s.

This change has sped up significantly our automated tests, so I have the feeling it will also help with the issue you are seeing. We'll have the option in RC1 that we are currently working on releasing, but I can also trigger a nightly build if you want to test before that.

Edit: actual link to the PR and some wording.

steffens · November 4, 2015, 12:07pm

@ceekay filebeat has an internal spooler collecting lines from multiple events before publishing (think collecting batches).

See config options spool_size and idle_timeout to tune the spooler. The smaller the batches the more processing overhead, potentially.

ceekay · November 4, 2015, 12:38pm

@tudor @steffens

Thanks for the updates. This sounds like it will resolve my issue - I'll check and let you know.

I understand the overhead increase for a very small spooler, but this case is specifically for pipeline latency monitoring and is quite low volume - one message per second, on one host.

lhm · November 4, 2015, 1:16pm

@steffens @ruflin I'm seeing the same behaviour as @jhidalgo - filebeat works as expected for a while, but then suddenly stops detecting log changes. After that, it continues to scan the files, but decides: "Not harvesting, file didn't change".

I'm using current head from the filebeat repo, because I needed the force_close_on_windows_files option. OS is Windows Server 2012 R2.

The prospectors are configured like this:

filebeat:
  prospectors:
    -
      paths:
        - c:\server\server1.log
      type: log
      fields:
        type: foo
      scan_frequency: 3s
      tail_on_rotate: false
      force_close_windows_files: true  
    -
      paths:
        - c:\server\server2.log
      type: log
      fields:
        type: foo
      scan_frequency: 3s
      tail_on_rotate: false
      force_close_windows_files: true

  # General filebeat configuration options
  registry_file: "C:/ProgramData/Filebeat/registry"

ceekay · November 5, 2015, 12:28am

@steffens

Setting spool_size to 1 immediately fixed my issue. Max latency time at the first node has reduced from ~8 seconds to 1 second, and min latency is ~0.05 secs, which is about the same as I was seeing before testing filebeat.

With spool_size at 1, I don't see much point in changing idle_timeout, as the output will fire as soon as an event is available.

Thanks very much!

andrewkroh · November 5, 2015, 9:30am

@lhm @jhidalgo I think I found the same problem. Take a look at this issue.

https://github.com/elastic/filebeat/pull/199

Topic		Replies	Views
Filebeat not seeing fileupdates Beats	3	698	July 5, 2017
Bug? Filebeat stops load balancing when volume is low Beats filebeat	1	374	December 17, 2019
Filebeat - Log Processing Issues/Delay/Data Loss Beats filebeat	0	106	October 16, 2024
Filebeat 1.1.0: Log Collection Halting Beats filebeat	18	2638	July 5, 2017
When filebeat has little work to do, it stops loadbalancing correctly Beats filebeat	1	303	December 13, 2019

File changes not detected after a while

Related topics