Filebeat holds on to a file descriptor too long even after a file was deleted?

Hello, filebeat experts, I reported a similar problem before in Filebeat might erroneously hold on to the file descriptor?
thanks @steffens for the tips!
I added the following to our config file back then, which worked for us for a while. I did not have special settings for close-* and clean_* options. since the default values for those settings make sense to us.
close_removed: true

our disk space was filled up by log files today. Even after deleting log files, filebeat still holds on to those file descriptors.
Here are partial lsof output, note that filebeat holds on to deleted files.

filebeat 28834 app 4r REG 9,3 102244879937 2231478 mydir/logs/MY-APP/i001/MY-APP.log.2018-24-01 (deleted)
filebeat 28834 app 5r REG 9,3 302071876737 2231495 mydir/logs/MY-APP/i001/MY-APP.log.2018-25-01 (deleted)
filebeat 28834 app 29r REG 9,3 19266669215 2231492 mydir/logs/MY-APP/i001/MY-APP.log.2018-29-01 (deleted)
filebeat 28834 app 77r REG 9,3 11570936491 2231435 mydir/logs/MY-APP/i001/MY-APP.log (deleted)
filebeat 28834 app 96r REG 9,3 51866263552 2231429 mydir/logs/MY-APP/i001/MY-APP.log

I also copied relevant lines from filebeat log below:
2018-01-28T23:19:56Z INFO Harvester started for file: my-dir/logs/MY-APP.log
2018-01-28T23:26:29Z INFO File is inactive: my-dir/logs/MY-APP.log. Closing because close_inactive of 5m0s reached.
2018-01-29T00:07:49Z INFO Harvester started for file: my-dir/logs/MY-APP.log
2018-01-29T04:43:20Z INFO File is inactive: my-dir/logs/MY-APP.log. Closing because close_inactive of 5m0s reached.
2018-01-29T04:57:07Z INFO Harvester started for file: my-dir/logs/MY-APP.log
2018-01-30T00:56:23Z INFO Harvester started for file: my-dir/logs/MY-APP.log
2018-01-30T01:01:56Z INFO File is inactive: my-dir/logs/MY-APP.log. Closing because close_inactive of 5m0s reached.
2018-01-30T02:05:30Z INFO Harvester started for file: my-dir/logs/MY-APP.log
2018-01-30T02:12:36Z INFO File is inactive: my-dir/logs/MY-APP.log. Closing because close_inactive of 5m0s reached.
2018-01-30T02:47:03Z INFO Harvester started for file: my-dir/logs/MY-APP.log
2018-01-30T16:53:03Z INFO Harvester started for file: my-dir/logs/MY-APP.log
2018-01-30T16:58:24Z INFO File is inactive: my-dir/logs/MY-APP.log. Closing because close_inactive of 5m0s reached.
2018-01-30T17:04:24Z INFO Harvester started for file: my-dir/logs/MY-APP.log

We rotate logs daily, for example, after 1/29, MY-APP.log was rotated to MY-APP.log.2018-29-01, then after another day, it was gziped to MY-APP.log.2018-29-01.gz.

Two things in the logs look strange to me:

  1. from lsof output, why filebeat still holds on to fds that were already deleted? I already set "close_removed: true" lsof was run after the log files were deleted at least more than an hour ago.

  2. from filebeat logs, it only records open MY-APP.log, not MY-APP.log.2018-29-01, so this seems like filebeat opens MY-APP.log on Jan 29th, then holds to it even after it was renamed to MY-APP.log.2018-29-01?

thanks for any pointers!

@filebeater Filebeat tracks inode changes + reading offset, not the filename, so when MY-APP.log get renamed to MY-APP.log.2018-29-01 FB will keep it open for some time and watch it, sometimes data still get written to these files during transitions.

 touch test
 ls -li test
4307006725 -rw-r--r--  1 ph  wheel  0 Jan 31 10:42 test
mv test test.old
ls -li test.old
4307006725 -rw-r--r--  1 ph  wheel  0 Jan 31 10:42 test.old

Questions

  • FB should release the deleted file eventually, it is the case?
  • It should also release the APP.log.2018-29-01
1 Like

@pierhugues, thanks for the reply, by "FB will keep it open for some time and watch it", could you clarify what is "some time"? is it a few minutes? or a few days? since it keep logs from a few days ago. On day 30th, it still kept the file descriptor from the log file for day 24th (2018-24-01).

filebeat 28834 app 4r REG 9,3 102244879937 2231478 mydir/logs/MY-APP/i001/MY-APP.log.2018-24-01 (deleted)
filebeat 28834 app 5r REG 9,3 302071876737 2231495 mydir/logs/MY-APP/i001/MY-APP.log.2018-25-01 (deleted)

Re your question below:

  1. FB should release the deleted file eventually, it is the case?
    Could you clarify what is eventually? is it after "a few minutes"? "a few days" or "a few hours"?
    Not within a couple of hours in our case. After we manually deleted the files, we had to kill filebeat a couple of hours after that, since filebeat did not release fds by then and disk usage was 100% which affected our production.

  2. It should also release the APP.log.2018-29-01
    It did NOT release APP.log.2018-29-01, which is why I thought it may be a bug in filebeat?

That seems a long time to keep a file open.

Do you have specific settings for ignore_older on your log prospector?
Could you share your config, I will take a look.

no, we do not have ignore_older setting in our filebeat config file, thanks for looking into this!

I copied filebeat.yml template file below:

-input_type: log

  paths:
    ${filebeat.log.paths}

  close_removed: true

  multiline.pattern: '^[[:space:]]+|^Caused by:|^Desired|^- |^: '

  multiline.negate: false

  multiline.match: after

processors:
 - drop_fields:
     fields: ["beat.name", "beat.version", "input_type", "@timestamp", "type", "offset"]

output.kafka:
  hosts: ["${kafka.logging.metadata.tls.broker.list}"]

  topic: '${filebeat.log.kafka.topic}'
  partition.hash:
    hash: ['beat.hostname']
    random: true # if false non-hashable events will be dropped

  required_acks: 1
  compression: gzip
  max_message_bytes: 1000000
  ssl.certificate_authorities: ["${kafka.ssl.ca}"]
  ssl.certificate: "${kafka.ssl.certificate}"
  ssl.key: "${kafka.ssl.key}"

btw, I added more settings, hope it will helps with our disk full issue (due to fds held by filebeat). I am still confused by why filebeat would hold on the fd of the deleted file given that "clean_removed" is enabled by default per https://www.elastic.co/guide/en/beats/filebeat/master/configuration-filebeat-options.html. or the documentation is lying here :sweat_smile:

 close_removed: true

  # filebeat will focus on sending the current log files
  ignore_older: 36h

  close_inactive: 10m

  # set max lifetime of each harvester (fd)
  close_timeout: 30h
  clean_inactive: 48h

@filebeater Will do a bit more check on my side.

1 Like

@pierhugues thanks so much in taking the time to investigate this!
btw, we use beats version 5.5.1, please let me know if we should update it, and which stable version we should use? thanks!

I am also wondering whether this bug was due to a race condition, i tested before that filebeat can release fd after a file is deleted in a local dev box.

@filebeater Can you try on 6.1.x? If you have the same behavior?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.