Hey all!
We are attempting to load test filebeat for shipping our kubernetes logs. Our current max log line rate on any given node is 5000/s, so we are load testing 7500 lines/s. We do this by spinning up a single pod with 10 containers each logging at 750 lines/s for around 10 minutes (TLDR there is some random delay so we are not literally logging 7500 lines/s, but on average thats what we're doing).
So, this ends up producing exactly 4.5MM log lines over an approximately 10 minutes load test. What I am noticing is long after the load producing containers have stopped producing logs, we are seeing around 3.5 to 3.7MM log lines sent to the output (in this case redis for now). Eventually, filebeat kind of catches up, sometimes shipping exactly 4.5MM logs to the output, sometimes falling a few thousand short.
We have also seen this failure mode with smaller load tests such as 500 lines/s for 3MM total and 600 lines/s for 3.6MM total, which makes me think its not related to number of lines but something else in our config or resources.
Also of note is that we are setting queue.mem
with max events set to 65536
, so it is weird to me that we have, in some cases, 1MM events being shipped after the inputs have been emptied.
Has anybody seen this before? Our ultimate goal is to reliably ship all 4.5MM events in near realtime and not be 1MM events behind. In a real-world situation, where we may be shipping 5000 lines/s for more 3-4 hours during peak, it seems to me that filebeat will never be able to catch up at this rate?
I have posted our system info and configuration below, but any guidance here would be great!
System info:
- filebeat:
v7.1.1
- Run as a k8s DaemonSet with 1 CPU and 512MB of memory
And our config file:
logging:
level: info
to_syslog: true
metrics.enabled: true
metrics.period: 15s
http:
enabled: true
queue.mem:
events: 65536
flush.timeout: 5s
flush.events: 8192
processors:
- add_cloud_metadata:
overwrite: true
filebeat.inputs:
- type: docker
enabled: true
containers.ids: ['*']
exclude_lines: ['/healthz']
processors:
- add_kubernetes_metadata:
in_cluster: true
- decode_json_fields:
fields: ["message"]
output.redis:
hosts: ["redis:6379"]
key: "%{[kubernetes.labels.app]:filebeat}"
db: 0
timeout: 5
worker: 8
bulk_max_size: 4096/1024 # have tried both of these with similar results