Filebeat 6.8.20 running on CentOS 7.9
We're using Filebeat to send some logs to our Elasticsearch cluster (by way of Logstash and a Kafka layer which provides retention in case of cluster downtime) and as the data volume has increased, we've found the files are being appended to faster than Filebeat is sending the contents. (There is no lag in the Kafka layer.) A new log file is created each hour and over time, by looking at the process file handlers and what's appearing in the Elasticsearch cluster, we see that Filebeat is still reading the files for previous hours as well as the current hour. (Right now it's still reading a files for the previous 11 hours.)
The size of a log file that covers one hour varies over the course of a day with a minimum of around ~40million lines with filesize ~14GB up to ~90million lines and ~31GB. I'm wondering if the size of the files is a problem. I.E. whether the speed at which Filebeat can read a file slows down to some degree as the filesize grows. Or whether the speed at which the logs are now being written simply exceeds the speed at which Filebeat can read files. I cannot find any information about this though, or about if there's any recommended maximum filesize for Filebeat or anything like that.
I'm wondering about changing logging so that new files are created more often, say every 15 minutes rather than every hour, so each file is at most ~23million lines. Is there any reason to believe that will help with Filebeat performance?
Config :
- input_type: log
paths:
- /var/lib/argus/logstashStream/argusFlow*
exclude_files: [".gz$"]
document_type: netflow
harvester_buffer_size: 1048576000
ignore_older: 12h
output.logstash:
loadbalance: true
bulk_max_size: 2048
hosts: ["foo.bar:10101","foo.bar:10101","foo.bar:10101","foo.bar:10101","foo.bar:10101","foo.bar:10101"]
workers: 6
the loadbalance on the output is there as at some point in the past we found that it helped performance.