Filebeat messages missing under high load

We need help with the following problem:

We are receiving BIG IP F5 request-logs via F5 High Speed Logging in dedicated filebeat pods (8) in our production environment in Azure cloud.
These filebeat pods are connected to a Logstash instance that parses the json messages an sends them to Elasticsearch.
We see that a lot (>30%) of (json) messages are missing. We cannot reproduce these missing messages in our test environment.

When we capture the traffic between F5 and Filebeat we see that Filebeat sends a "Zero Window" probe meaning it's not able
to deal with the traffic/data sent and is telling the F5 to pause untill Filebeat has freed its buffer. But as Filebeat does not send a non-zero length window probe on time the F5 resets the connection.
As we cannot increase the Zero window timeout setting on the F5 without having impact on all tcp-traffic so we really want to solve
this in Filebeat. We are running Filebeat/Logstash/ES on Kubernetes on Azure Kubernetes Services (AKS) and we tried increasing the number of resources and pods but all without any results.
We also tried running the Filebeat on seperate VMs and Logstash/ES still on Docker, but still with the same result (lost messages caused by connection resets).

We are actually out of clues/options of how to make this setup properly handle the load.

We were finally able to solve this issue by increasing the filebeat buffers (queue depth) and increasing the capacity of the Azure backend (Eventhub).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.