Hi,
We have filebeat instances on huge amout of servers but couple of them are problematic. Cofiguration looks like:
- Filebeat harvests files from folder /mnt/log/*.log and sends them to ES cluster.
- There is a lot traffic going on in this folder. In peak hours new files are created every couple of seconds and can grow up to over 10mb.
- Every 30min there is a python backup script which has to send all files to AWS s3. It's checking 'lsof' first to find out if the files are not in use.
The backup script fails every time because filebeat keeps all log files open until the diskspace is full and we have to stop the filebeat and rerun the backup script to clear it.
Yesterday I tunred debug log for filebeat and saw that it needs over 30min to harvests one file.
So I think it's not keeping up with harvesting all that files within the 30min window, new files are still created, files are not being uploaded to s3, /mnt is getting 100% and everything is stuck. Also events are published to ES cluster, with some delay, until we stop filebeat.
We're are using filebeat 5.1.1 now and on each server have few prospectors configured like this(sorry or the XXX'es, I cannot share much more details):
paths:
- /mnt/log/XXX*
input_type: log
exclude_files: [".gz$"]
document_type: vom_urs
multiline:
pattern: ^@?[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3}
negate: true
match: after
ignore_older: 30m
close_inactive: 10s
close_renamed: true
close_removed: true
clean_inactive: 10m
clean_removed: true
scan_frequency: 1s
Output is set to ES and look like this:
elasticsearch:
hosts: ["http://X.X.X.X:9200", "http://X.X.X.X:9200", "http://X.X.X.X:9200", "http://X.X.X.X:9200"]
bulk_max_size: 4096
loadbalance: true
worker: 4
Is there a way to speed up/optimize harvesting process in such situation?