I have set up a process using Filebeat to send a high traffic Nginx accesslog to Logstash. However, Filebeat intermittently hangs, with a consistent pattern of increasing memory cache.
Both Filebeat and Nginx are configured in individual container environments within the same pod in a Kubernetes (k8s) setup, utilizing the accesslog path volume mounted.
Accesslog files are rotated every 4 hours. The rotation method is simple: rename, then gzip compression. It goes like this:
I've observed a peculiar pattern. The hanging Filebeat resumes operation through the logrotate process. When the log file rotates, Filebeat starts operating again (with memory cache usage decreasing), and logs are sent until, after a certain period of time, it hangs again. I speculate that this could be a problem with the file descriptor usage of the harvester.
In addition, Filebeat does not hang during early morning hours when traffic is low. During the daytime when traffic is high, the log file size increases to about 5GB every 4 hours.
Would you have the chance to update to Filebeat 7.17 (or 8.x)? The newer filestream input is GA on this version and may solve performance issues found in the log input. You can give this input a try on 7.12, but it was in beta then.
Hello, @jsoriano
I want to express my deepest gratitude once again for your recommendation.
I have upgraded filebeat to version 7.17.12 and configured it to use filestream input type. I applied this to half of the instances we operate and tested it over several days. As a result, there has been a significant performance improvement, with about a 20% increase in the amount of logs being ingested.
However, log ingestion gaps are still occurring. If you look at the attached Kibana view, you will see that there are gaps in log ingestion that are resolved at the logrotate interval. I suspect that Filebeat may have a resource leak when handling large files.
This pattern occurs during high-traffic periods from 12:00 to 24:00 and is characterized by a rapid increase in container memory cache (page cache) usage, which is resolved at the logrotate interval.
Increases around 15:00 and resolves at 16:00 logrotate
Increases around 19:00 and resolves at 20:00 logrotate
Increases around 23:00 and resolves at 24:00 logrotate
I could consider running logrotate more frequently, but that doesn't seem like a fundamental solution. Do you have any other tuning suggestions to further enhance filebeat performance in this regard?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.