Filebeat failing way way behind when shipping output

Hey all!

We are attempting to load test filebeat for shipping our kubernetes logs. Our current max log line rate on any given node is 5000/s, so we are load testing 7500 lines/s. We do this by spinning up a single pod with 10 containers each logging at 750 lines/s for around 10 minutes (TLDR there is some random delay so we are not literally logging 7500 lines/s, but on average thats what we're doing).

So, this ends up producing exactly 4.5MM log lines over an approximately 10 minutes load test. What I am noticing is long after the load producing containers have stopped producing logs, we are seeing around 3.5 to 3.7MM log lines sent to the output (in this case redis for now). Eventually, filebeat kind of catches up, sometimes shipping exactly 4.5MM logs to the output, sometimes falling a few thousand short.

We have also seen this failure mode with smaller load tests such as 500 lines/s for 3MM total and 600 lines/s for 3.6MM total, which makes me think its not related to number of lines but something else in our config or resources.

Also of note is that we are setting queue.mem with max events set to 65536, so it is weird to me that we have, in some cases, 1MM events being shipped after the inputs have been emptied.

Has anybody seen this before? Our ultimate goal is to reliably ship all 4.5MM events in near realtime and not be 1MM events behind. In a real-world situation, where we may be shipping 5000 lines/s for more 3-4 hours during peak, it seems to me that filebeat will never be able to catch up at this rate?

I have posted our system info and configuration below, but any guidance here would be great!

System info:

  • filebeat: v7.1.1
  • Run as a k8s DaemonSet with 1 CPU and 512MB of memory

And our config file:

  level: info
  to_syslog: true
  metrics.enabled: true
  metrics.period: 15s

  enabled: true

  events: 65536
  flush.timeout: 5s 8192

  - add_cloud_metadata:
      overwrite: true

  - type: docker
    enabled: true
    containers.ids: ['*']
    exclude_lines: ['/healthz']
      - add_kubernetes_metadata:
          in_cluster: true
      - decode_json_fields:
          fields: ["message"]

  hosts: ["redis:6379"]
  key: "%{[]:filebeat}"
  db: 0
  timeout: 5
  worker: 8
  bulk_max_size: 4096/1024  # have tried both of these with similar results

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.