Filebeat harvesters don't restart after stopping due to ignore_older

I've been trawling these topics and may have found a couple of issues that sound similar but I can't figure out what the expected behaviour is in the following use case (which seems quite straight forward):

I have some nginx log files that currently don't see a lot of activity and have logrotate set to 1 day. So what happens is this:

  • I restart filebeat to reset everything
  • I visit my site and generate some logs
  • everything is fine and the log entries are shipped to Logstash
  • I stop visiting my site for a few days
  • logrotate rotates the logs at the normal time
  • After 24 hours (default ignore_older setting) I see the File reading error. Stopping harvester. Error: Stop harvesting as file is older then ignore_older message in my syslog
    • I think this is actually from the rotated log file (ie. *access.log.1)
  • The next log rotation does nothing as the new log file is still empty
  • I get another File reading error. Stopping harvester. Error: Stop harvesting as file is older then ignore_older message in my syslog
    • Pretty sure this is for the new empty log file as it's been 24 hours since it was created
  • a few days later I visit my site again and generate some new log entries

This is what I expect:

  • Within 10 seconds (the default scan frequency) the log file is seen as having been updated and the harvester starts again
  • The new log entries are shipped to Logstash and appear in my Kibana view

This is what happens:

  • Nothing
  • I waited quite a while and nothing comes through
  • Other files (eg. syslog) are continuing to be shipped (although configured on a different prospector to differentiate type)
  • if I restart Filebeat then the harvesters start again and the log entries get shipped

I'm pretty sure this is not the expected behaviour. I considered upping the ignore_older to greater than the logrotate time but then i will also have to configure logrotate to rotate empty files or it'll go wrong anyway

btw, i'm using Filebeat 1.0.1 on Ubuntu

Ok just spotted this one:

And it seems that my expectation is correct and the harvester should restart.

Is this a bug or a misconfiguration on my part - I wondered if it was a corner case due to the file being empty when it is ignored :s

My configuration is really simple

---
filebeat:
  prospectors:
  - paths:
    - "/var/log/nginx/*access.log"
    document_type: nginx_access

I am using a config_dir to define prospectors in separate files.

This is my main config (comments removed for brevity) - it's mostly defaults

filebeat:
  prospectors:
    -
      type: stdin
      paths:
        - "-"

  registry_file: /var/lib/filebeat/registry
  config_dir: /etc/filebeat/conf.d

output:
  logstash:
    hosts: ["logstash.myserver.com:5044"]

shipper:

logging:
  files:
    rotateeverybytes: 10485760 # = 10MB

The weirdest thing is the stdin prospector - this is because I use Chef to add prospector files and initially there are no prospectors - Filebeat doesn't like it when you try to start it with no prospectors (I will raise this as a separate issue)

Oh, and the other prospectors are defined thus:

---
filebeat:
  prospectors:
  - paths:
    - "/var/log/nginx/*error.log"
    document_type: nginx_error
---
filebeat:
  prospectors:
  - paths:
    - "/var/log/syslog"
    document_type: syslog

I still have to look more in detail into the problem to see what the issue is. Have a look at the discussion here to see if your case goes under the ignore_older problem here: https://github.com/elastic/filebeat/issues/181

I remember an issue popping up that filebeat stops when there are not prospectors starting. If this is the case, please open an issue.

I have some more info - I turned on debug and created a really simple test case with this configuration:

filebeat:
  prospectors:
    - paths:
        - "/var/log/test/*.log"
      ignore_older: 30s
      scan_frequency: 5s

  registry_file: /var/lib/filebeat/registry

output:
  logstash:
    hosts: ["logstash.myserver.com:5044"]

shipper:

logging:
  level: debug
  to_files: true
  files:
    path: /var/log/filebeat
    name: filebeat.log
    rotateeverybytes: 10485760 # = 10MB

I then ran some tests with these 30s and 5s timings. This is what I found:

  1. mkdir -p /var/log/test
  2. Start Filebeat with no log files in /var/log/test/
  3. cd /var/log/test
  4. echo "hello" >> hello.log
  5. Line is picked up and published
  6. Wait 30s for ignore_older to kick in
  7. echo "hello" >> hello.log
  8. Line is picked up and published
  9. mv hello.log hello.log.1 && echo "hello" >> hello.log
  10. Line is picked up and published
  11. mv hello.log hello.log.1 && touch hello.log
  12. Wait 30s for ignore_older to kick in
  13. echo "hello" >> hello.log
  14. Line is NOT picked up and published

There are no additional errors in the debug just lots of this:

2015-12-28T14:50:51Z DBG  Start next scan
2015-12-28T14:50:51Z DBG  scan path /var/log/test/*.log
2015-12-28T14:50:51Z DBG  Check file for harvesting: /var/log/test/hello.log
2015-12-28T14:50:51Z DBG  Update existing file for harvesting: /var/log/test/hello.log
2015-12-28T14:50:51Z DBG  Not harvesting, file didn't change: /var/log/test/hello.log
2015-12-28T14:50:55Z DBG  Flushing spooler because of timemout. Events flushed: 0

This seems to be 100% reproducible (4 out of 4 so far)

The key seems to be waiting for ignore_older to take effect after the log rotation but before any new lines have been added to the new log file

I'll transcribe this over to github

https://github.com/elastic/filebeat/issues/320 added