"Not resuming rotated file" and filebeat doesn't close rotated file


#1

Hi,

i'm using filebeat 1.2.3 and i do not understand a behavior on rotated files that lead my monitoring system to be unreliable from two point of view: events are not sent properly and rotated files are not closed properly. I've the following test case:

my config file

filebeat:
  registry_file: /var/run/filebeat
  publish_async: false
  idle_timeout: 10s
  spool_size: 2048

  prospectors:
    -
      input_type: log
      force_close_files: true
      multiline:
        negate: true
        pattern: "^{"
        match: after
      paths:
        - /var/log/test.log
      close_older: 5m

output:
  logstash:
    max_retries: 1000000
    worker: 5
    timeout: 10
    bulk_max_size: 8192
    loadbalance: true
    hosts: ["localhost:5044"]

logging:
   level: info

   # enable file rotation with default configuration
   to_files: true

   # do not log to syslog
   to_syslog: false

   files:
     path: /var/log/filebeat
     name: filebeat.log
     keepfiles: 10
     rotateeverybytes: 104857600

Logging after log rotation

bash$> cat /var/log/filebeat/filebeat.log | grep test.log
2016-06-15T08:06:04+02:00 INFO Harvester started for file: /var/log/test.log
2016-06-15T14:42:55+02:00 INFO Not resuming rotated file: /var/log/test.log
2016-06-15T14:42:55+02:00 INFO Harvester started for file: /var/log/test.log
2016-06-15T14:43:05+02:00 INFO Not resuming rotated file: /var/log/test.log
2016-06-15T14:43:15+02:00 INFO Not resuming rotated file: /var/log/test.log
2016-06-15T14:43:35+02:00 INFO Not resuming rotated file: /var/log/test.log
2016-06-15T14:43:45+02:00 INFO Not resuming rotated file: /var/log/test.log
2016-06-15T14:43:55+02:00 INFO Not resuming rotated file: /var/log/test.log
2016-06-15T14:44:05+02:00 INFO Not resuming rotated file: /var/log/test.log
2016-06-15T14:44:15+02:00 INFO Not resuming rotated file: /var/log/test.log
2016-06-15T14:44:25+02:00 INFO Not resuming rotated file: /var/log/test.log
....

List of open file at 2016-06-15T18:44:25+02:00

bash$> lsof -p 43117 
filebeat 43117 root   21r   REG      8,17  6237614492  2883693 /var/log/test.log.1 (deleted)
filebeat 43117 root   41r   REG      8,17  4395389695  2884028 /var/log/test.log

Some analysis on the state file

At this point i've analyzed changes on state file and i've fount a strange behaviour. With the following command i'm monitoring changes on reference of the inode and offset and we can see that the inode reference flaps betweend the deleted rotated file and the current one.

bash$> while true; do (cat /var/run/filebeat | jq . | grep -B 6 -P "source.*test.log" && sleep 2); done
  "/var/log/test.log": {
    "FileStateOS": {
      "device": 2065,
      "inode": 2883693
    },
    "offset": 1867412648,
    "source": "/var/log/test.log"
  "/var/log/test.log": {
    "FileStateOS": {
      "device": 2065,
      "inode": 2883693
    },
    "offset": 1867412648,
    "source": "/var/log/test.log"
  "/var/log/test.log": {
    "FileStateOS": {
      "device": 2065,
      "inode": 2884028
    },
    "offset": 410248229,
    "source": "/var/log/test.log"
  "/var/log/test.log": {
    "FileStateOS": {
      "device": 2065,
      "inode": 2884028
    },
    "offset": 410294689,
    "source": "/var/log/test.log"
  "/var/log/test.log": {
    "FileStateOS": {
      "device": 2065,
      "inode": 2884028
    },
    "offset": 410294689,
    "source": "/var/log/test.log"

One little detail that can be useful: the process writing on /var/log/test.log after log rotation write correcty to the new file. Is there an issue in my configuration?


(ruflin) #2

Can you change your config so that filebeat also can see the rotate files? Means changing

paths:
- /var/log/test.log

to

paths:
- /var/log/test.log*

And remove force_close_files?

How is your log rotation done? Rename the file and create a new one? When do you delete the oldest rotated file?

Could you also try if 5.0.0-alpha3 resolves your issue?

Thanks for sharing all the details with inode and device, that is very helpful. One note on the Not resuming rotated file: /var/log/test.log log line. This is probably misleading, as the harvester actually continued on the rotated file but because filebeat can't find the new file, path name was not updated.

For the open files: You have 2 open files in your list. How old is the file that was deleted? Is it older the close_older?


#3

I'm going to test your suggested changes in the next few hours with filebeat 1.2.3. thank you.
The logrotation is done with logrotate: the old one is renamed and compressed and the logging process create the new one.

The deleted file was deleted at 14:42:55+02:00, the close_older is set to 5m, but at 18:44:25 the deleted file is still opened..


#4

Hi,
changing path to /var/log/test.log*, i've different behaviour. If i not compress files after the rotation it works fine, but if they are compressed the problem seems to be the same.
The issue can be trigger in situation of congestion of logstash (process produces more logs that logstash and ES can elaborate and store) or when there are connection problems with logstash.


(ruflin) #5

Thanks for the testing. It seems like there is a problem with closing the file if it is not found anymore :frowning: Obviously it should also close the file when you compress it (even though it didn't finish reading).

Could you try if this error persists in 5.0.0-alpha3? I will try reproduce this one my side. Do you directly compress the oldest file after rotation?


(system) #6

This topic was automatically closed after 21 days. New replies are no longer allowed.