How fast can Filebeat send logs from disk?

Filebeat 7.17 sending over the network to Logstash 6 (we're looking to get it upgraded) running on CentOS 7, 64GB RAM, 2 x 14 core Intel Xeon E5-2690.

We've got some logs which often total as much as ~1.4b single line events with a total file size of over 400GB, per day. The rate at which they're generated varies over the course of a day and at peak it's well over 1m lines per minute. The logs are written to disk such that each file contains 15 minutes worth. A log file for a 15 minute period mid-afternoon yesterday is 20.4m lines and 6.1GB. That's 22.2k/s. Is it realistic to expect to be able to send this volume of logs in real time with Filebeat?

We've been sending these logs with Filebeat for years and to start with it was fine, but over time the volume has grown and for a while now no amount of fiddling with settings makes it keep up to real time. The amount of lag varies from it catches up over night to it is still sending data from 24 hours ago.

We've been using the (now deprecated) log input. I'm looking at switching over to the new filestream input but it's tricky because this is production data, there is a lot of it, and changing the input means effectively losing the info about where Filebeat has read up to in each file.

Filebeat sends to multiple Logstash instances behind a load balancer, which send to a Kafka queue which acts as a buffer against downtime of our Elasticsearch cluster. There's no lag in the Kafka layer and I can see which files Filebeat has read handles on, so the issue is definitely the data isn't sent fast enough.

Hi @mikewillis

My Experience that is on the quite high side assuming you mean 22K Events / sec for a single filebeat and that comes out to about 7MB/s which is a bit high as well. The Events number is high... filebeat seems to be bound on the input side.

(Insert long story here filebeat was originally designed to be "thin" but log volumes are growing and you are not the only one seeing these issue, I do think there is some work being done to "fatten" the filebeat pipe a bit"

You can can try tuning that the logstash output with more workers and and max_bulk_size etc but that may only have a limited effect / not work.

You also can run more than 1 filebeat on a host... if you have a naming or directory hierarchy where you can divide up the workload.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.