Filebeat slows down as registry file grows

Filebeat version: 7.9.2
OS: Ubuntu 18.04.5 LTS
VM: 16 CPU, 64 GB RAM
Output: Kafka

I have an issue with Filebeat's throughput. There are about 600 files are created every minute and those have around 4000 lines in them, each line has about 1000 characters. None of those will be written to again.
The VM and the Kafka cluster is in the same physical location, Kafka cluster is stable and has enough resources to handle a lot more connections/events than I'd like to forward.

Filebeat config:

  enabled: true
  path: /etc/filebeat/configs/live/*.yml
  reload.enabled: true
  reload.period: 10s

filebeat.registry.flush: 1m
  enabled: false
http.enabled: true
monitoring.cluster_uuid: ...

logging.metrics.enabled: false
logging.to_syslog: false
logging.level: info
logging.to_files: true
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 8
  permissions: "0640"

  - drop_fields:
      fields: ...

  hosts: ...
  topic: ...
    - topic: ...
      when.has_fields: ...
    reachable_only: true
  ssl.certificate_authorities: ...
  username: ...
  password: ...
  required_acks: 1
  compression: snappy

Log config:

- type: log
    - .../*.log
  scan_frequency: 20s
  file_identity.path: ~
  close_renamed: true
  close_removed: true
  close_inactive: 10s
  clean_removed: true
  fields_under_root: true

I have a small app that tails Filebeat's log and deletes files that Filebeat closes due to inactivity.

The issue:

It takes about 90 minutes until the numbered JSON file in /var/lib/filebeat/registry/filebeat/ grows around 25-30MB and then the cycle starts again. I tried to fiddle with different settings and registry flush time without deleting the registry folder, but now it's even worse, the current file is over 50M: 51M Sep 22 16:09 25303522.json, though there are more files to ingest since fewer events were shipped than written.

While there are enough resources on the VM too, Filebeat seems to only hog almost all available memory over time but never opens more than 2500 handles (file descriptors/harvesters) or uses more CPU but over times focuses on writing to the registry file and leaving everything else in the dust.

I'm monitoring the Kafka cluster too and there was no error or any resource issue (disk/CPU/RAM/network). Also, after I stop Filebeat and remove the /var/lib/filebeat folder, the throughput spikes and disk writes plummets each and every time. So I'm fairly sure that's where I need to have a fix, I just don't know what would be the best way to keep my files forwarded to the MQ safely without duplication and max out the throughput.
Most of the time I have a couple of files written over time (even multiple GBs) and closed after rotation and never had issues with FB throughput. However, it seems to track tens of thousands of files, and removing those from the registry file after those were deleted, is new for Filebeat too.

I'm open to different solutions and if needed I can provide more metrics.

On top of that, I stopped the process that creates files. Four hours later, when Filebeat stopped processing events, I manually removed all files, but the registry file's size only dropped from 37M to 18M. I waited for more than 10 minutes (scan frequency:20s). No changes after restarting Filebeat. Not sure if the different IDs have anything to do with it:

memlog/store.go:119 Loading data file of '/var/lib/filebeat/registry/filebeat' succeeded. Active transaction id=33395489
memlog/store.go:124 Finished loading transaction log file for '/var/lib/filebeat/registry/filebeat'. Active transaction id=33437597

Naturally, none of the files in the registry exists anymore.

No takers?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.