Filebeat Holds open rotated log files

On our kubernetes clusters we have an issue caused by filebeat holding open rotated log files for too long. Filebeat needs a more native understanding of log file rotation.

Docker is configured to rotate logs at 20meg and keep 10 logs.

1 A container writes log messages.
2 Filebeat notices the new log /var/lib/docker/containers//-log, creates it's internal index for the file and starts watching it.
3 the container writes a lot more logs.
4 Docker rotates the log file creating /var/lib/docker/containers//-log.2.
5 Filebeat notices that the file is rotated and updates it's internal index to match the new file name.
6 This repeats until the file is named /var/lib/docker/containers//-log.9.
7 The container continues to write logs.
8 Docker now rotates the logs one more time deleting the current file /var/lib/docker/containers//-log.9 and renaming /var/lib/docker/containers//-log.8 to /var/lib/docker/containers//-log.9.
9 Filebeat's periodic scan goes through /var/lib/docker/containers/ and updates it's index. It finds there is a file /var/lib/docker/containers//-log.9 which matches one of it's indexed files and calls that index current. THIS IS WRONG. This is not the same file which was originally opened.
10 filebeat continues to hold open the filehandle to this deleted file because according to it's index there is still a file named /var/lib/docker/containers//-log.9.
11 Eventually close_inactive expires for the open file handle and filebeat closes this file finally releasing the disk space.

I see two potential fixes for this:

  1. Have a better understanding of log file rotation. I wouldn't want to close the filehandle the moment it is renamed to log.2 because docker could still be writing to it or filebeat might be a little behind. But once it has been renamed AND we have reached EOF docker is never going to re-open this file and it is now time to close it.
  2. Index based on filehandle. Once the file has been deleted we should close it. close_deleted is enabled by default. It's just failing to notice that the file has actually been deleted.

Hi @aforster,

Could you please paste the Filebeat configuration you are using? Or is this just the default example manifest we provide?

Best regards

This occurs with the default manifest.

filebeat.yml

name: "${KUBERNETES_NODE_NAME}"

filebeat.prospectors:

  • type: log
    max_bytes: 200000
    paths:
    • '/var/lib/docker/containers//-json.log'
      \ fields:
      cluster_id: mycluster-1
      cluster_alias: cluster-1
      fields_under_root: true

processors:

output.console:
enabled: false
pretty: true

output.elasticsearch:
enabled: true
hosts: ['${FILEBEAT_ENDPOINT}']
\ compression_level: 5
bulk_max_size: 49
template.enabled: false
headers:
\ X-API-Key: '${ES_API_KEY}'

logging.to_files: true

logging.level: debug \

logging.files:
rotateeverybytes: 16777216
keepfiles: 4

To verify:

  1. In one window loop every 10 seconds and perform an lsof of /var/lib/docker grepping for test-json (a file I will create.)
  2. In a second session create a file /var/lib/docker/containers/test/test-json.log and write a message to it every 20 seconds. We kill this once filebeat opens the file for reading.
  3. Once filebeat has opened the file for reading delete that file, immediately create a new one with the exact same name then write to the new file every 20 seconds like we are logging normally.

Results for the above:
In one window loop every 10 seconds and perform an lsof of /var/lib/docker grepping for test-json (a file I will create.)
Significant events are highlighted with comments. The commands used to create and delete files are below. this block.

myhost ~ # while true; do date; lsof /var/lib/docker 2>/dev/null | grep test-json; sleep 10; done
Mon Jul  9 17:19:57 UTC 2018
Mon Jul  9 17:20:07 UTC 2018
Mon Jul  9 17:20:17 UTC 2018
Mon Jul  9 17:20:28 UTC 2018
# Test file is created and filebeat has opened it.
# Note the filehandle (11r)
Mon Jul  9 17:20:38 UTC 2018
filebeat  37201   root   11r   REG 202,240       18 19923130 /var/lib/docker/containers/test/test-json.log
Mon Jul  9 17:20:48 UTC 2018
filebeat  37201   root   11r   REG 202,240       30 19923130 /var/lib/docker/containers/test/test-json.log
Mon Jul  9 17:20:58 UTC 2018
filebeat  37201   root   11r   REG 202,240       30 19923130 /var/lib/docker/containers/test/test-json.log
Mon Jul  9 17:21:09 UTC 2018
filebeat  37201   root   11r   REG 202,240       30 19923130 /var/lib/docker/containers/test/test-json.log
# Test file has been deleted and recreated. We are now writing to the new file not the deleted file.
# Note the filehandle of the deleted file is the same as the original.
# Note the filehandle of the new file is different (18r.)
# Note the seventh column over which is the file size. The deleted file is 30 bytes, the new file is 6.
Mon Jul  9 17:21:19 UTC 2018
filebeat  37201   root   11r   REG 202,240       30 19923130 /var/lib/docker/containers/test/test-json.log (deleted)
filebeat  37201   root   18r   REG 202,240        6 19923131 /var/lib/docker/containers/test/test-json.log
Mon Jul  9 17:21:29 UTC 2018
filebeat  37201   root   11r   REG 202,240       30 19923130 /var/lib/docker/containers/test/test-json.log (deleted)
filebeat  37201   root   18r   REG 202,240        6 19923131 /var/lib/docker/containers/test/test-json.log
Mon Jul  9 17:21:39 UTC 2018
filebeat  37201   root   11r   REG 202,240       30 19923130 /var/lib/docker/containers/test/test-json.log (deleted)
filebeat  37201   root   18r   REG 202,240        6 19923131 /var/lib/docker/containers/test/test-json.log
# Note that the new file (filehandle 18r)  is growing but the original (11r) is not being written to.
Mon Jul  9 17:21:49 UTC 2018
filebeat  37201   root   11r   REG 202,240       30 19923130 /var/lib/docker/containers/test/test-json.log (deleted)
filebeat  37201   root   18r   REG 202,240       12 19923131 /var/lib/docker/containers/test/test-json.log
Mon Jul  9 17:22:00 UTC 2018
filebeat  37201   root   11r   REG 202,240       30 19923130 /var/lib/docker/containers/test/test-json.log (deleted)
filebeat  37201   root   18r   REG 202,240       12 19923131 /var/lib/docker/containers/test/test-json.log
Mon Jul  9 17:22:10 UTC 2018
filebeat  37201   root   11r   REG 202,240       30 19923130 /var/lib/docker/containers/test/test-json.log (deleted)
filebeat  37201   root   18r   REG 202,240       12 19923131 /var/lib/docker/containers/test/test-json.log
Mon Jul  9 17:22:20 UTC 2018
filebeat  37201   root   11r   REG 202,240       30 19923130 /var/lib/docker/containers/test/test-json.log (deleted)
filebeat  37201   root   18r   REG 202,240       18 19923131 /var/lib/docker/containers/test/test-json.log
<trunkated for brevity>
Mon Jul  9 17:25:24 UTC 2018
filebeat  37201   root   11r   REG 202,240       30 19923130 /var/lib/docker/containers/test/test-json.log (deleted)
filebeat  37201   root   18r   REG 202,240       54 19923131 /var/lib/docker/containers/test/test-json.log
Mon Jul  9 17:25:34 UTC 2018
filebeat  37201   root   11r   REG 202,240       30 19923130 /var/lib/docker/containers/test/test-json.log (deleted)
filebeat  37201   root   18r   REG 202,240       54 19923131 /var/lib/docker/containers/test/test-json.log
Mon Jul  9 17:25:45 UTC 2018
filebeat  37201   root   11r   REG 202,240       30 19923130 /var/lib/docker/containers/test/test-json.log (deleted)
filebeat  37201   root   18r   REG 202,240       60 19923131 /var/lib/docker/containers/test/test-json.log
# Five minutes have passed and the original file (filehandle 11r) is now finally closed.
Mon Jul  9 17:25:55 UTC 2018
filebeat  37201   root   18r   REG 202,240       60 19923131 /var/lib/docker/containers/test/test-json.log

In a second session create a file /var/lib/docker/containers/test/test-json.log and write a message to it every 20 seconds. We kill this once filebeat opens the file for reading.

myhost test # date ; while true; do echo "stuff" >> test-json.log ; sleep 5; done
Mon Jul  9 17:20:31 UTC 2018

Once filebeat has opened the file for reading delete that file, immediately create a new one with the exact same name then write to the new file every 20 seconds like we are logging normally.

myhost test # date ; rm test-json.log && touch test-json.log ;while true; do echo "stuff" >> test-json.log; sleep 30; done
Mon Jul  9 17:21:11 UTC 2018

Any reason not to open a bug for this one?

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.