Filebeat version: 7.9.2
OS: Ubuntu 18.04.5 LTS
VM: 16 CPU, 64 GB RAM
I have an issue with Filebeat's throughput. There are about 600 files are created every minute and those have around 4000 lines in them, each line has about 1000 characters. None of those will be written to again.
The VM and the Kafka cluster is in the same physical location, Kafka cluster is stable and has enough resources to handle a lot more connections/events than I'd like to forward.
filebeat.config.inputs: enabled: true path: /etc/filebeat/configs/live/*.yml reload.enabled: true reload.period: 10s filebeat.registry.flush: 1m monitoring: enabled: false http.enabled: true monitoring.cluster_uuid: ... logging.metrics.enabled: false logging.to_syslog: false logging.level: info logging.to_files: true logging.files: path: /var/log/filebeat name: filebeat keepfiles: 8 permissions: "0640" processors: - drop_fields: fields: ... output.kafka: hosts: ... topic: ... topics: - topic: ... when.has_fields: ... partition.round_robin: reachable_only: true ssl.certificate_authorities: ... username: ... password: ... required_acks: 1 compression: snappy
- type: log paths: - .../*.log scan_frequency: 20s file_identity.path: ~ close_renamed: true close_removed: true close_inactive: 10s clean_removed: true fields_under_root: true fields: ...
I have a small app that tails Filebeat's log and deletes files that Filebeat closes due to inactivity.
It takes about 90 minutes until the numbered JSON file in
/var/lib/filebeat/registry/filebeat/ grows around 25-30MB and then the cycle starts again. I tried to fiddle with different settings and registry flush time without deleting the registry folder, but now it's even worse, the current file is over 50M:
51M Sep 22 16:09 25303522.json, though there are more files to ingest since fewer events were shipped than written.
While there are enough resources on the VM too, Filebeat seems to only hog almost all available memory over time but never opens more than 2500 handles (file descriptors/harvesters) or uses more CPU but over times focuses on writing to the registry file and leaving everything else in the dust.
I'm monitoring the Kafka cluster too and there was no error or any resource issue (disk/CPU/RAM/network). Also, after I stop Filebeat and remove the
/var/lib/filebeat folder, the throughput spikes and disk writes plummets each and every time. So I'm fairly sure that's where I need to have a fix, I just don't know what would be the best way to keep my files forwarded to the MQ safely without duplication and max out the throughput.
Most of the time I have a couple of files written over time (even multiple GBs) and closed after rotation and never had issues with FB throughput. However, it seems to track tens of thousands of files, and removing those from the registry file after those were deleted, is new for Filebeat too.
I'm open to different solutions and if needed I can provide more metrics.