Filebeat no releasing deleted files

Hello,

I could not find a post with my current specific problem so creating a new one :slight_smile:

I use Filebeat version 6.8.2 running on Debian 9

I have problems with Filebeat on some machines where Filebeat keeps deleted files open.

# lsof -p $(pidof filebeat)
COMMAND     PID USER   FD      TYPE             DEVICE  SIZE/OFF       NODE NAME
filebeat 271341 root  cwd       DIR              254,0      4096          2 /
filebeat 271341 root  rtd       DIR              254,0      4096          2 /
filebeat 271341 root  txt       REG              254,0  37206677     408973 /usr/share/filebeat/bin/filebeat
filebeat 271341 root  mem       REG              254,0     47632     795464 /lib/x86_64-linux-gnu/libnss_files-2.24.so
filebeat 271341 root  mem       REG              254,0     47688     795466 /lib/x86_64-linux-gnu/libnss_nis-2.24.so
filebeat 271341 root  mem       REG              254,0     89064     795461 /lib/x86_64-linux-gnu/libnsl-2.24.so
filebeat 271341 root  mem       REG              254,0     31616     795462 /lib/x86_64-linux-gnu/libnss_compat-2.24.so
filebeat 271341 root  mem       REG              254,0   1689360     795454 /lib/x86_64-linux-gnu/libc-2.24.so
filebeat 271341 root  mem       REG              254,0     14640     795457 /lib/x86_64-linux-gnu/libdl-2.24.so
filebeat 271341 root  mem       REG              254,0    135440     795469 /lib/x86_64-linux-gnu/libpthread-2.24.so
filebeat 271341 root  mem       REG              254,0    153288     795450 /lib/x86_64-linux-gnu/ld-2.24.so
filebeat 271341 root    0r      CHR                1,3       0t0       1029 /dev/null
filebeat 271341 root    1u     unix 0xffff90ba02411800       0t0  917142052 type=STREAM
filebeat 271341 root    2u     unix 0xffff90ba02411800       0t0  917142052 type=STREAM
filebeat 271341 root    3w      REG              254,0   2758858     522267 /var/log/filebeat/filebeat
filebeat 271341 root    4u  a_inode               0,11         0      10698 [eventpoll]
filebeat 271341 root    5r      REG                8,1 536870985 1610612839 /data/disk0/logs/my-service/my-node-3/my-service.log.1 (deleted)
filebeat 271341 root    6r      REG                8,1 536871202  536871013 /data/disk0/logs/my-service/my-node-5/my-service.log.1 (deleted)
filebeat 271341 root    7r      REG                8,1 441577681       1083 /data/disk0/logs/my-service/my-node-0/my-service.log
filebeat 271341 root    8u      REG              254,0     16284     522861 /var/lib/filebeat/registry.new
filebeat 271341 root    9r      REG              254,0     32928     523019 /var/log/faillog
filebeat 271341 root   10r      REG                8,1 536871156 1074657635 /data/disk0/logs/my-service/my-node-2/my-service.log.10
filebeat 271341 root   11r      REG              254,0  55786770     524366 /var/log/auth.log
filebeat 271341 root   12r      REG                8,1 536871216        116 /data/disk0/logs/my-service/my-node-4/my-service.log.1 (deleted)
filebeat 271341 root   13r      REG                8,1 536871257 1610612841 /data/disk0/logs/my-service/my-node-3/my-service.log.1 (deleted)
filebeat 271341 root   14r      REG                8,1 536871073        109 /data/disk0/logs/my-service/my-node-4/my-service.log.1 (deleted)
filebeat 271341 root   15u     IPv4          917246061       0t0        TCP in1855.my-service0.hay0.bwcom.net:54398->logstash12.log1.hay0.bwcom.net:5044 (ESTABLISHED)
filebeat 271341 root   16r      REG                8,1 536871057  536871016 /data/disk0/logs/my-service/my-node-5/my-service.log.1 (deleted)
filebeat 271341 root   17r      REG                8,1 536871173        110 /data/disk0/logs/my-service/my-node-4/my-service.log.1 (deleted)
filebeat 271341 root   18r      REG                8,1 536871324        118 /data/disk0/logs/my-service/my-node-4/my-service.log.1 (deleted)
filebeat 271341 root   19r      REG                8,1 536870932  536871017 /data/disk0/logs/my-service/my-node-5/my-service.log.1 (deleted)
filebeat 271341 root   20r      REG                8,1 536871072        113 /data/disk0/logs/my-service/my-node-4/my-service.log.1 (deleted)
filebeat 271341 root   21r      REG                8,1 536871332  536871019 /data/disk0/logs/my-service/my-node-5/my-service.log.1 (deleted)
filebeat 271341 root   22r      REG                8,1 536871139        114 /data/disk0/logs/my-service/my-node-4/my-service.log.1 (deleted)
filebeat 271341 root   23r      REG                8,1 536871157  536871020 /data/disk0/logs/my-service/my-node-5/my-service.log.1 (deleted)
filebeat 271341 root   24r      REG                8,1 536876004        115 /data/disk0/logs/my-service/my-node-4/my-service.log.1 (deleted)
filebeat 271341 root   25r      REG                8,1 536871222  536871021 /data/disk0/logs/my-service/my-node-1/my-service.log.10
filebeat 271341 root   26r      REG                8,1 536871330  536871022 /data/disk0/logs/my-service/my-node-5/my-service.log.1 (deleted)
filebeat 271341 root   27r      REG                8,1 536871574        119 /data/disk0/logs/my-service/my-node-4/my-service.log.1 (deleted)
filebeat 271341 root   28r      REG                8,1 536871029  536871023 /data/disk0/logs/my-service/my-node-5/my-service.log.1 (deleted)
...

My input config

---
- type: log
  paths:
  - /data/disk0/logs/my-service/*/my-service.log

  encoding: plain
  exclude_lines:
    - '^20[0-9]{2}(-[0-9]{2}){2} [0-9]{2}(:[0-9]{2}){2}.[0-9]{3} INFO'
  fields:
    log_prefix: dc
    log_idx: my-service-logs
  fields_under_root: false
  document_type: my-service-logs
  scan_frequency: 10s
  harvester_buffer_size: 16384
  max_bytes: 10485760

  multiline:
    pattern: '^[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}'
    negate: true
    match: after
  tail_files: false

  backoff: 1s
  max_backoff: 10s
  backoff_factor: 2

  close_inactive: 5m

  close_renamed: false

  close_removed: true

  close_eof: false

  clean_inactive: 0

  clean_removed: true

  close_timeout: 0

The /data/disk0/logs/my-service/*/my-service.log files are rotated about eveyr 30 mins. 10 older log files are kept and then they are deleted.

I would have expected close_inactive: 5m to keep this issue from happening.

Any tips on where to start with this?

Cheers,
AB

Going through similar topics and just filling in some additional info.

Logstash is not really under any significant pressure.
I see no connection errors in the Filebeat logs.

Filebeat is shipping to Logstash. Logstash and Elasticsearch are also version 6.8.2.

Hi @A_B :slightly_smiling_face:

We don't know any know issue about this. One question: Filebeat releases after a while, even if it's long? Or it simply doesn't release them, even if days have passed?

Hi @Mario_Castro,

thank you for the reply :slight_smile:

I will try to monitor if files are released after some time....

From the filebeat logs I can see that .metrics.filebeat.harvester.open_files is generally growing but does decrees temporally now and then

Looking at the current Filebeat log file from this weekend I can see that Filebeat did close a couple of these log files when close_inactive was reached.

# grep close_inactive /var/log/filebeat/filebeat | grep my-service
2019-11-02T04:27:43.749Z	INFO	log/harvester.go:280	File is inactive: /data/disk0/logs/my-service/my-node-2/my-service.log. Closing because close_inactive of 5m0s reached.
2019-11-02T22:52:20.440Z	INFO	log/harvester.go:280	File is inactive: /data/disk0/logs/my-service/my-node-3/my-service.log. Closing because close_inactive of 5m0s reached.
2019-11-03T13:07:52.963Z	INFO	log/harvester.go:280	File is inactive: /data/disk0/logs/my-service/my-node-1/my-service.log. Closing because close_inactive of 5m0s reached.
2019-11-04T06:21:18.704Z	INFO	log/harvester.go:280	File is inactive: /data/disk0/logs/my-service/my-node-0/my-service.log. Closing because close_inactive of 5m0s reached.

But over the same time span there are 129 deleted files been kept open

# lsof -p $(pidof filebeat) | grep deleted | wc -l
129

I could set close_timeout for these files. Is there any risk in setting a long close_timeout? Something like 6h or 12h, so that the file should definitely have been rotated by then...

Enabled debug logging for harvester on one of the machines. I noticed that Filebeat is quite busy discarding log lines as for these logs I only want WARN and CRITICAL. over 95% of all lines are discarded, probably over 99%. These logs are quite busy. I noticed that Filebeat logs are now rotated out after about a minute, so not sure how much useful information I will be able to capture...

Adding close_timeout seems to have mitigated the problem. Rotated log files are still kept open but Filebeat, or more specifically the harvester seems to respect close_timeout more rigidly than close_inactive. In my case the potential side effect of data loss is more acceptable then keeping the deleted files open.

I'm not fully sure what might be happening. I found out this PR however, maybe it helps you somehow https://github.com/elastic/beats/pull/13907

Hi @Mario_Castro,

thank you very much for the info :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.