Filebeat version: 7.9.2
OS: Ubuntu 18.04.5 LTS
VM: 16 CPU, 64 GB RAM
Output: Kafka
I have an issue with Filebeat's throughput. There are about 600 files are created every minute and those have around 4000 lines in them, each line has about 1000 characters. None of those will be written to again.
The VM and the Kafka cluster is in the same physical location, Kafka cluster is stable and has enough resources to handle a lot more connections/events than I'd like to forward.
Filebeat config:
filebeat.config.inputs:
enabled: true
path: /etc/filebeat/configs/live/*.yml
reload.enabled: true
reload.period: 10s
filebeat.registry.flush: 1m
monitoring:
enabled: false
http.enabled: true
monitoring.cluster_uuid: ...
logging.metrics.enabled: false
logging.to_syslog: false
logging.level: info
logging.to_files: true
logging.files:
path: /var/log/filebeat
name: filebeat
keepfiles: 8
permissions: "0640"
processors:
- drop_fields:
fields: ...
output.kafka:
hosts: ...
topic: ...
topics:
- topic: ...
when.has_fields: ...
partition.round_robin:
reachable_only: true
ssl.certificate_authorities: ...
username: ...
password: ...
required_acks: 1
compression: snappy
Log config:
- type: log
paths:
- .../*.log
scan_frequency: 20s
file_identity.path: ~
close_renamed: true
close_removed: true
close_inactive: 10s
clean_removed: true
fields_under_root: true
fields:
...
I have a small app that tails Filebeat's log and deletes files that Filebeat closes due to inactivity.
The issue:
It takes about 90 minutes until the numbered JSON file in /var/lib/filebeat/registry/filebeat/
grows around 25-30MB and then the cycle starts again. I tried to fiddle with different settings and registry flush time without deleting the registry folder, but now it's even worse, the current file is over 50M: 51M Sep 22 16:09 25303522.json
, though there are more files to ingest since fewer events were shipped than written.
While there are enough resources on the VM too, Filebeat seems to only hog almost all available memory over time but never opens more than 2500 handles (file descriptors/harvesters) or uses more CPU but over times focuses on writing to the registry file and leaving everything else in the dust.
I'm monitoring the Kafka cluster too and there was no error or any resource issue (disk/CPU/RAM/network). Also, after I stop Filebeat and remove the /var/lib/filebeat
folder, the throughput spikes and disk writes plummets each and every time. So I'm fairly sure that's where I need to have a fix, I just don't know what would be the best way to keep my files forwarded to the MQ safely without duplication and max out the throughput.
Most of the time I have a couple of files written over time (even multiple GBs) and closed after rotation and never had issues with FB throughput. However, it seems to track tens of thousands of files, and removing those from the registry file after those were deleted, is new for Filebeat too.
I'm open to different solutions and if needed I can provide more metrics.