Filebeat harvester never stops running and doesn't detect file changes

We have beats running and watching a single log file that is updated a few times per second constantly. The log file is generated by the Python3.6 logging.handlers.WatchedFileHandler and is rotated by logrotate once it reaches 1G. For the purposes of this issue I have disabled logrotate completely to rule it out as a factor.

Operating System:

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="14.04.5 LTS, Trusty Tahr"

Beats Version:

5.2.2

Config:

filebeat:

  prospectors:
    -
      paths:
        - /logs/my-app/my-app*.jsonlog
      fields_under_root: true
      input_type: log
      document_type: customlogtype
      tail_files: false
      type: log
      fields:
        a: bunch
        of: fields
        with: extra
        info: banana

output:
  logstash:
    hosts: ["192.51.100.1:5000"]

logging.level: debug
logging.to_files: true
logging.to_syslog: false
logging.files:
  path: /var/log/filebeat
  name: mybeat.log
  keepfiles: 7

Logs:

2017-03-02T01:14:02Z DBG  Run prospector
2017-03-02T01:14:02Z DBG  Start next scan
2017-03-02T01:14:02Z DBG  Check file for harvesting: /logs/my-app/my-app-PID54.jsonlog
2017-03-02T01:14:02Z DBG  Update existing file for harvesting: /logs/my-app/my-app-PID54.jsonlog, offset: 17268771
2017-03-02T01:14:02Z DBG  Harvester for file is still running: /logs/my-app/my-app-PID54.jsonlog
2017-03-02T01:14:02Z DBG  Prospector states cleaned up. Before: 1, After: 1
2017-03-02T01:14:12Z DBG  Run prospector
2017-03-02T01:14:12Z DBG  Start next scan
2017-03-02T01:14:12Z DBG  Check file for harvesting: /logs/my-app/my-app-PID54.jsonlog
2017-03-02T01:14:12Z DBG  Update existing file for harvesting: /logs/my-app/my-app-PID54.jsonlog, offset: 17268771
2017-03-02T01:14:12Z DBG  Harvester for file is still running: /logs/my-app/my-app-PID54.jsonlog
2017-03-02T01:14:12Z DBG  Prospector states cleaned up. Before: 1, After: 1
2017-03-02T01:14:16Z DBG  41 events out of 1822 events sent to logstash. Continue sending
2017-03-02T01:14:16Z DBG  Try to publish 1781 events to logstash with window size 41
2017-03-02T01:14:22Z INFO Non-zero metrics in the last 30s: libbeat.logstash.publish.write_bytes=9534 libbeat.logstash.publish.read_bytes=12
2017-03-02T01:14:22Z DBG  Run prospector
2017-03-02T01:14:22Z DBG  Start next scan
2017-03-02T01:14:22Z DBG  Check file for harvesting: /logs/my-app/my-app-PID54.jsonlog
2017-03-02T01:14:22Z DBG  Update existing file for harvesting: /logs/my-app/my-app-PID54.jsonlog, offset: 17268771
2017-03-02T01:14:22Z DBG  Harvester for file is still running: /logs/my-app/my-app-PID54.jsonlog
2017-03-02T01:14:22Z DBG  Prospector states cleaned up. Before: 1, After: 1
2017-03-02T01:14:32Z DBG  Run prospector
2017-03-02T01:14:32Z DBG  Start next scan
2017-03-02T01:14:32Z DBG  Check file for harvesting: /logs/my-app/my-app-PID54.jsonlog
2017-03-02T01:14:32Z DBG  Update existing file for harvesting: /logs/my-app/my-app-PID54.jsonlog, offset: 17268771
2017-03-02T01:14:32Z DBG  Harvester for file is still running: /logs/my-app/my-app-PID54.jsonlog
2017-03-02T01:14:32Z DBG  Prospector states cleaned up. Before: 1, After: 1
2017-03-02T01:14:33Z DBG  41 events out of 1781 events sent to logstash. Continue sending
2017-03-02T01:14:33Z DBG  Try to publish 1740 events to logstash with window size 41
2017-03-02T01:14:42Z DBG  Run prospector
2017-03-02T01:14:42Z DBG  Start next scan
2017-03-02T01:14:42Z DBG  Check file for harvesting: /logs/my-app/my-app-PID54.jsonlog
2017-03-02T01:14:42Z DBG  Update existing file for harvesting: /logs/my-app/my-app-PID54.jsonlog, offset: 17268771
2017-03-02T01:14:42Z DBG  Harvester for file is still running: /logs/my-app/my-app-PID54.jsonlog
2017-03-02T01:14:42Z DBG  Prospector states cleaned up. Before: 1, After: 1
2017-03-02T01:14:50Z DBG  41 events out of 1740 events sent to logstash. Continue sending
2017-03-02T01:14:50Z DBG  Try to publish 1699 events to logstash with window size 41
2017-03-02T01:14:52Z INFO Non-zero metrics in the last 30s: libbeat.logstash.publish.read_bytes=12 libbeat.logstash.publish.write_bytes=9192
2017-03-02T01:14:52Z DBG  Run prospector
2017-03-02T01:14:52Z DBG  Start next scan
2017-03-02T01:14:52Z DBG  Check file for harvesting: /logs/my-app/my-app-PID54.jsonlog
2017-03-02T01:14:52Z DBG  Update existing file for harvesting: /logs/my-app/my-app-PID54.jsonlog, offset: 17268771
2017-03-02T01:14:52Z DBG  Harvester for file is still running: /logs/my-app/my-app-PID54.jsonlog
2017-03-02T01:14:52Z DBG  Prospector states cleaned up. Before: 1, After: 1

As you can see in the logs the Harvester reads the file and records the offset but the offset never increments and the Harvester never stops running. Before it gets stuck in this loop all of the logs up until the offset byte are sent correctly to elasticsearch.

Is filebeat running on the same machine that creates the logs? Do you use shared volumes?

I'm having the same issue with 5.2.1. I'm running filebeat and the log producer in separate rkt containers with a shared Nomad allocation directory mounted into both containers.

$ jq .[0] filebeat/local/data/registry 
{
  "source": "/alloc/logs/nomad_events.json",
  "offset": 338650,
  "FileStateOS": {
    "inode": 1703953,
    "device": 51824
  },
  "timestamp": "2017-03-08T15:23:43.587490845Z",
  "ttl": -1
}
$ stat alloc/logs/nomad_events.json 
  File: ‘alloc/logs/nomad_events.json’
  Size: 863582    	Blocks: 1696       IO Block: 4096   regular file
Device: ca70h/51824d	Inode: 1703953     Links: 1
Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2017-03-08 16:15:26.272448002 +0000
Modify: 2017-03-08 16:30:31.792756808 +0000
Change: 2017-03-08 16:30:31.792756808 +0000
 Birth: -

So the inode and device are the same, but filebeat stopped doing anything useful with the file more than an hour ago. The last output from filebeat is

2017/03/08 15:23:40.118570 logp.go:230: INFO Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.write_bytes=517 libbeat.logstash.publish.read_bytes=6 registrar.writes=1 libbeat.logstash.published_and_acked_events=1 registrar.states.update=1 libbeat.publisher.published_events=1 publish.events=1
2017/03/08 15:23:42.350661 log.go:84: INFO Harvester started for file: /alloc/logs/nomad_events.json
2017/03/08 15:28:53.872405 logp.go:230: INFO Non-zero metrics in the last 30s: libbeat.logstash.publish.write_bytes=21458 libbeat.logstash.publish.read_bytes=36 registrar.writes=1 publish.events=39 filebeat.harvester.running=1 libbeat.logstash.published_and_acked_events=38 registrar.states.update=39 filebeat.harvester.open_files=1 filebeat.harvester.started=1 libbeat.logstash.call_count.PublishEvents=2 libbeat.publisher.published_events=51
2017/03/08 15:28:53.872472 log.go:109: INFO File is inactive: /alloc/logs/filebeat.stdout.0. Closing because close_inactive of 5m0s reached.
2017/03/08 15:33:54.073369 logp.go:230: INFO Non-zero metrics in the last 30s: libbeat.logstash.publish.read_bytes=6 filebeat.harvester.running=-1 filebeat.harvester.open_files=-1 filebeat.harvester.closed=1 libbeat.logstash.publish.write_bytes=1700
2017/03/08 15:33:54.074654 log.go:84: INFO Harvester started for file: /alloc/logs/filebeat.stdout.0

My filebeat config is

fields:
    nomad:
        job_name:    "nomad-watcher"
        alloc_name:  "nomad-watcher.watcher[0]"
        alloc_id:    "b9e850e9-4de4-2f8f-7837-fa40c06931a9"
        alloc_index: 0

filebeat.prospectors:
    -   input_type: "log"
        document_type: "stdio"
        paths:
            - "/alloc/logs/*.stdout.0"
            - "/alloc/logs/*.stderr.0"
    
    -   input_type: "log"
        document_type: "nomad_events"
        json:
            keys_under_root: true
            add_error_key: true
        paths:
            - "/alloc/logs/nomad_events.json"

output.logstash:
    enabled: true
    loadbalance: true
    hosts:
        - 10.112.2.175:5121
        - 10.112.2.154:5121

My filebeat command line is

/filebeat/filebeat -e -path.home /filebeat -path.config /local -path.data /local/data

result of a SIGQUIT of the filebeat process: https://gist.github.com/blalor/e06e90e10cf4f664446e6cf08087227b

ooh, this may have been a memory allocation issue. I was only giving filebeat 10MB and, with -d "*" enabled I saw an error (now scrolled off-screen) about not being able to allocate memory for a prospected file. filebeat didn't exit, however, it just hung.

I assume what is happening in your case is, that filebeat cannot assign more memory or file handlers and just waits until more memory becomes available again. It would be nice to see the exact error that was shown by filebeat for further investigation.

This topic was automatically closed after 21 days. New replies are no longer allowed.