Filebeat High CPU consumption on two of four hosts

Hi,

I am using filebeat version 6.2.4 on a RHEL host to ship log data from applications to a log stash instance. On two of our four hosts that are configured to send data to the same log stash hosts, we are seeing significantly higher CPU usage (less than 30% compared to 150%+). The configuration files between these hosts are identical, however logs being captured by the glob patterns could vary in size and content due to different applications running between the hosts.

Looking through previous posts regarding high CPU, I have changed the scan_frequency to 30s up from 10s, and set max_procs: 2 with only mild improvements. CPU was above 200% prior to the max_procs setting. Our registry file is about 2.9M in size, which may seem large but the hosts using less than 30% CPU on average have a similar registry file size.

When comparing the monitoring output in logs, the number of open files and volume of data appear to be similar between hosts, ranging from 90 to 120 open files, so the difference in performance has me confused as to what else could be contributing to the significant difference in resources. Would you have any suggestions on what to look at next?

For anyone else that stumbles upon this post...

There were some extremely large log files in our environment due to trace and debug logging enabled for logs that filebeat was capturing. I think we will be able to narrow in on all log files that send a significant amount of data using the debug logging filebeat has for publish events.

I have some solutions for you:

1) Download log files to non-production environment
By non-production environment I mean a server that does not negatively affect the service.

Use FB (filebeat) for sending these big files to your Log Management. You can use this tutorial.

2) CPU load by filebeat with respect to running application
You can set lowest priority of consuming CPU for Filebeat service. Lowest priority is 19.

vim /usr/lib/systemd/system/filebeat.service
[Service]
Nice=19

This is very useful in production environment.

Hope it will be useful for you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.