Hello, Filebeat experts,
we just encountered a disk usage issue after deploying filebeat to production a few weeks.
We set up filebeat to ship log files (file with extension .log) to kafka. The log files are rotated every day, and appended with the timestamp (e.g., XXX.log.datestamp). Log files more than 5 days old will be deleted.
It seems like, on a couple of machines, even though the old log files had been delete by cron jobs, they were still opened by filebeat. As a result the disk space was not freed up, causing disk nearly full. After we stopped the filebeat service on these machines, then the disk was free up by nearly 50%.
I have two questions:
(1) when would filebeat release a file descriptor?
An desirable behavior to us is: For the files being renamed or deleted, filebeat release the fd once it completes processing the file content existed before the renaming/deleting.
btw, I listed all filebeat log lines related to the fd being held by filebeat after deletion. what looks strange to me is that "Harvester started for file" and "Closing" does not match. There are 11 of "Harvester started" and only 5 of "..Closing..". I would expect one more "started" than "closing", why so many "Harvester started" than ""..Closing.."?
grep "READER-0.log" filebeat*
filebeat.2:2017-10-13T16:37:25Z INFO Harvester started for file: READER-0.log
filebeat.2:2017-10-14T00:00:10Z INFO Harvester started for file: READER-0.log
filebeat.2:2017-10-15T00:00:05Z INFO Harvester started for file: READER-0.log
filebeat.2:2017-10-15T16:13:03Z INFO File is inactive: READER-0.log. Closing because close_inactive of 5m0s reached.
filebeat.2:2017-10-16T00:00:07Z INFO Harvester started for file: READER-0.log
filebeat.2:2017-10-16T01:37:23Z INFO File is inactive: READER-0.log. Closing because close_inactive of 5m0s reached.
filebeat.2:2017-10-16T18:12:02Z INFO File is inactive: READER-0.log. Closing because close_inactive of 5m0s reached.
filebeat.2:2017-10-17T00:00:00Z INFO Harvester started for file: READER-0.log
filebeat.1:2017-10-18T00:00:03Z INFO Harvester started for file: READER-0.log
filebeat.1:2017-10-19T00:00:01Z INFO Harvester started for file: READER-0.log
filebeat.1:2017-10-19T23:58:28Z INFO File is inactive: READER-0.log. Closing because close_inactive of 5m0s reached.
filebeat.1:2017-10-20T00:00:07Z INFO Harvester started for file: READER-0.log
filebeat.1:2017-10-21T00:00:06Z INFO Harvester started for file: READER-0.log
filebeat:2017-10-21T17:00:42Z INFO File is inactive: READER-0.log. Closing because close_inactive of 5m0s reached.
filebeat:2017-10-22T00:00:09Z INFO Harvester started for file: READER-0.log
filebeat:2017-10-23T00:00:02Z INFO Harvester started for file: READER-0.log
I also log into the machines that do not have the disk full issue, the number of "Harvester started for file" log lines matches better with the number of "Closing" lines, e..g, 11 of "Harvester started" lines and 9 of "Closing" lines
(2) under what scenarios filebeat might erroneously hold on to the file descriptors?
thanks,
yan