Filebeat Intermittently Hanging with Increasing Memory Cache while Processing High-Traffic Nginx Accesslog


I have set up a process using Filebeat to send a high traffic Nginx accesslog to Logstash. However, Filebeat intermittently hangs, with a consistent pattern of increasing memory cache.

Both Filebeat and Nginx are configured in individual container environments within the same pod in a Kubernetes (k8s) setup, utilizing the accesslog path volume mounted.

Accesslog files are rotated every 4 hours. The rotation method is simple: rename, then gzip compression. It goes like this:

  • application.log -> application.{date}.log -> application.{date}.log.gz
  • and then we start writing a new application.log.

I've observed a peculiar pattern. The hanging Filebeat resumes operation through the logrotate process. When the log file rotates, Filebeat starts operating again (with memory cache usage decreasing), and logs are sent until, after a certain period of time, it hangs again. I speculate that this could be a problem with the file descriptor usage of the harvester.

In addition, Filebeat does not hang during early morning hours when traffic is low. During the daytime when traffic is high, the log file size increases to about 5GB every 4 hours.

  • Filebeat version: 7.12.1
  • Filebeat configuration:
  events: 40960
  flush.min_events: 20480

  - type: log
      _@type: nginx-log
      instance_name: myinstance
    fields_under_root: true
    exclude_files: ['\.gz$']
      - mylogpath/application.log*

    hosts: ["mylogstashinfo"]
    loadbalance: true
  level: warning
  to_files: true
  to_syslog: false
    path: /myfilebeatpath/logs
    name: filebeat-plain.log
    keepfiles: 10

Any insights or possible solutions for this hanging issue would be greatly appreciated.

Thank you!

Hey @ryoni88, welcome to discuss :slight_smile:

Would you have the chance to update to Filebeat 7.17 (or 8.x)? The newer filestream input is GA on this version and may solve performance issues found in the log input. You can give this input a try on 7.12, but it was in beta then.

You can read more about this input in filestream input | Filebeat Reference [7.17] | Elastic

1 Like

Hello, @jsoriano :grinning:
I want to express my deepest gratitude once again for your recommendation. :pray: :pray:

I have upgraded filebeat to version 7.17.12 and configured it to use filestream input type. I applied this to half of the instances we operate and tested it over several days. As a result, there has been a significant performance improvement, with about a 20% increase in the amount of logs being ingested.

However, log ingestion gaps are still occurring. If you look at the attached Kibana view, you will see that there are gaps in log ingestion that are resolved at the logrotate interval. I suspect that Filebeat may have a resource leak when handling large files.

This pattern occurs during high-traffic periods from 12:00 to 24:00 and is characterized by a rapid increase in container memory cache (page cache) usage, which is resolved at the logrotate interval.

  • Increases around 15:00 and resolves at 16:00 logrotate
  • Increases around 19:00 and resolves at 20:00 logrotate
  • Increases around 23:00 and resolves at 24:00 logrotate

I could consider running logrotate more frequently, but that doesn't seem like a fundamental solution. Do you have any other tuning suggestions to further enhance filebeat performance in this regard?

I will also attach the memory usage of filebeat.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.