We are trying to run Filebeat on a production server to ship logs to Logstash. However, it is failing to ship logs as quickly as they are being written. If I inspect the Filebeat's registry, it shows that over the past 3 hours it has only processed 26 GB of an 81 GB log file. Our expected volume is 400 - 800 GB of logs per day from this one log file. We are also seeing Filebeat take 60-70% CPU utilization on this server, which is a very beefy server. This makes us wonder if Filebeat can handle the task at hand.
We have checked our Logstash ingester (which sits in front of RabbitMQ) and it does not seem to be bottlenecking. The rest of our pipeline seems healthy.
The bottleneck could be not on Filebeat, but on Logstash or on the outputs, for example, if you are using elasticsearc as an output and it can't keep up with the events per second rate, it will put back pressure on logstash that will then put back pressure on filebeat.
On most of the time the bottleneck is the logstash outputs or pipeline, not on filebeat.
Filebeat outputs to an ingestion Logstash layer which does not filter and only forwards to RabbitMQ, from which another Logstash layer applies filters and outputs to Elasticsearch. We have not seen any signs of bottlenecks in any of these downstream layers.
Update: We are in a good state now. I was able to use exclude_lines to filter out a lot of noise which didn't need to be shipped and that allowed Filebeat to keep pace. We also were able to decrease the log size by stopping some superfluous processes.
Also, I misspoke when I said that Filebeat took up 60-70% of the CPU. It took up that much on one core of many cores.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.