I am trying to use Filebeat to extract lines which contain a particular string and send those lines to Elasticsearch (or Logstash).
I'm now able to use either "include_lines" or "exclude_lines" to grab the relevant lines and send them. The problem however is that using either of these methods causes the CPU on the Filebeat Windows machine ( Win Server 2008) to climb to > 70% and, while the logs are being generated, pretty much stay there.
The logs are application specific and are generated at the rate of around one 50Mb log file every 10-12 minutes while the application is running. Each log file contains ~ 600,000 lines. There are only around 200-300 lines in each log which will match on the "include_lines" string.
If I run Filebeat without the "include_lines" string match and just send everything Filebeat happily runs at about 2-4% CPU and less than 30Mb memory. But the far end Logstash crashes under the strain of all the messages in no time at all (even with every message that does not match my string being immediately dropped).
Is this normal?
I would be very grateful for any suggestions for how this could be tuned for better performance.
I think the regex matcher in filebeat tries to match any substring-match in your log. That is, if one wants to match beginning of line, one has to use ^ and use $ for end of line. This turns the .* patterns basically into a NOOP, which still must be executed (by default longest match).
Thank you for the reply.
I just tried changing the regex as you described and restarted the service. The CPU usage for filebeat process touched 81% CPU at the beginning, then settled down to about 60% CPU after about a minute.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.