Hello, filebeat experts, I reported a similar problem before in Filebeat might erroneously hold on to the file descriptor?
thanks @steffens for the tips!
I added the following to our config file back then, which worked for us for a while. I did not have special settings for close-* and clean_* options. since the default values for those settings make sense to us.
close_removed: true
our disk space was filled up by log files today. Even after deleting log files, filebeat still holds on to those file descriptors.
Here are partial lsof output, note that filebeat holds on to deleted files.
I also copied relevant lines from filebeat log below:
2018-01-28T23:19:56Z INFO Harvester started for file: my-dir/logs/MY-APP.log
2018-01-28T23:26:29Z INFO File is inactive: my-dir/logs/MY-APP.log. Closing because close_inactive of 5m0s reached.
2018-01-29T00:07:49Z INFO Harvester started for file: my-dir/logs/MY-APP.log
2018-01-29T04:43:20Z INFO File is inactive: my-dir/logs/MY-APP.log. Closing because close_inactive of 5m0s reached.
2018-01-29T04:57:07Z INFO Harvester started for file: my-dir/logs/MY-APP.log
2018-01-30T00:56:23Z INFO Harvester started for file: my-dir/logs/MY-APP.log
2018-01-30T01:01:56Z INFO File is inactive: my-dir/logs/MY-APP.log. Closing because close_inactive of 5m0s reached.
2018-01-30T02:05:30Z INFO Harvester started for file: my-dir/logs/MY-APP.log
2018-01-30T02:12:36Z INFO File is inactive: my-dir/logs/MY-APP.log. Closing because close_inactive of 5m0s reached.
2018-01-30T02:47:03Z INFO Harvester started for file: my-dir/logs/MY-APP.log
2018-01-30T16:53:03Z INFO Harvester started for file: my-dir/logs/MY-APP.log
2018-01-30T16:58:24Z INFO File is inactive: my-dir/logs/MY-APP.log. Closing because close_inactive of 5m0s reached.
2018-01-30T17:04:24Z INFO Harvester started for file: my-dir/logs/MY-APP.log
We rotate logs daily, for example, after 1/29, MY-APP.log was rotated to MY-APP.log.2018-29-01, then after another day, it was gziped to MY-APP.log.2018-29-01.gz.
Two things in the logs look strange to me:
from lsof output, why filebeat still holds on to fds that were already deleted? I already set "close_removed: true" lsof was run after the log files were deleted at least more than an hour ago.
from filebeat logs, it only records open MY-APP.log, not MY-APP.log.2018-29-01, so this seems like filebeat opens MY-APP.log on Jan 29th, then holds to it even after it was renamed to MY-APP.log.2018-29-01?
@filebeater Filebeat tracks inode changes + reading offset, not the filename, so when MY-APP.log get renamed to MY-APP.log.2018-29-01 FB will keep it open for some time and watch it, sometimes data still get written to these files during transitions.
touch test
ls -li test
4307006725 -rw-r--r-- 1 ph wheel 0 Jan 31 10:42 test
mv test test.old
ls -li test.old
4307006725 -rw-r--r-- 1 ph wheel 0 Jan 31 10:42 test.old
Questions
FB should release the deleted file eventually, it is the case?
@pierhugues, thanks for the reply, by "FB will keep it open for some time and watch it", could you clarify what is "some time"? is it a few minutes? or a few days? since it keep logs from a few days ago. On day 30th, it still kept the file descriptor from the log file for day 24th (2018-24-01).
FB should release the deleted file eventually, it is the case?
Could you clarify what is eventually? is it after "a few minutes"? "a few days" or "a few hours"?
Not within a couple of hours in our case. After we manually deleted the files, we had to kill filebeat a couple of hours after that, since filebeat did not release fds by then and disk usage was 100% which affected our production.
It should also release the APP.log.2018-29-01
It did NOT release APP.log.2018-29-01, which is why I thought it may be a bug in filebeat?
btw, I added more settings, hope it will helps with our disk full issue (due to fds held by filebeat). I am still confused by why filebeat would hold on the fd of the deleted file given that "clean_removed" is enabled by default per https://www.elastic.co/guide/en/beats/filebeat/master/configuration-filebeat-options.html. or the documentation is lying here
close_removed: true
# filebeat will focus on sending the current log files
ignore_older: 36h
close_inactive: 10m
# set max lifetime of each harvester (fd)
close_timeout: 30h
clean_inactive: 48h
@pierhugues thanks so much in taking the time to investigate this!
btw, we use beats version 5.5.1, please let me know if we should update it, and which stable version we should use? thanks!
I am also wondering whether this bug was due to a race condition, i tested before that filebeat can release fd after a file is deleted in a local dev box.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.