Potential memory leak issue with filebeat and metricbeat

Since upgrading from ELK 7.17.1 to ELK 8.6.2 (and even with ELK 8.7.1) we are experiencing OOMKilled on filebeat and metricbeat pods. We had no issues with ELK 7.17.1. Increasing the resources allocations does not resolve the issue and simply delays the crash. This appears to be a memory leak issue with beats.

    State:          Running
      Started:      Thu, 25 May 2023 15:18:43 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Thu, 25 May 2023 02:53:22 +0000
      Finished:     Thu, 25 May 2023 15:18:41 +0000
    Ready:          True

This is an example of our filebeat pod memory in the past 24 hours

We have tried this config which was mentioned in other posts but it makes no differences. We also do not use cron jobs.

        processors:
          - add_kubernetes_metadata:
              add_resource_metadata:
                deployment: false
                cronjob: false

We are also seeing this in ELK 8.8.0

Can you post your complete Filebeat configuration?

What we need is heap profiles from Filebeat which should tell us what is using the memory. The instructions to do this are:

  1. Start the Beat process with httpprof (profiling) enabled. This allows us to easily extract memory profiles of the running process. Add these configuration options:
http.host: localhost
http.port: 6060
http.pprof.enabled: true
  1. Once Beats is started and is done initializing (after 5-10 minutes), you can collect the first memory dump via a simple curl command like this: curl -s -v http://localhost:8080/debug/pprof/heap > heap_normal.bin.
  2. Once you start noticing that the process is taking excessive amounts of memory, a second dump needs to be generated like curl -s -v http://localhost:8080/debug/pprof/heap > heap_high.bin.

If you attach the .bin files we can analyze them to see what is going on. The profile taken when the memory usage is excessive is the most important one.

1 Like