Filebeat daemonset losing messages

Mike_Williams · May 24, 2022, 2:36pm

Hi,

Roughly this question has been asked a few times over the years but I've yet to find any real solutions.

We've got filebeat deployed as a daemonset in an on-prem k8s cluster.

The config is very simple.

filebeat.autodiscover:
  providers:
    - type: kubernetes
      node: ${NODE_NAME}
      hints.enabled: true

processors:
  - add_kubernetes_metadata:
      in_cluster: true
      labels.dedot: true
      annotations.dedot: true

  - add_fields:
      target: kubernetes
      fields:
        cluster: ${KUBERNETES_CLUSTER}

output.logstash:
  ....

My problem appears two fold.

filebeat isn't fast enough to process all the logs.
filebeat is closing files when they are removed.

We're sending ~5k HTTP request per second to 3 pods across 3 k8s nodes (1600 hits per second per pod) for 10 minutes. So about 3 million requests in total. Only about 2.3 million end up viewable in kibana.
1600 per second is the max the service pods can handle in this load test. We've run this same test at least 5 times with the same result.

So to expand on my points above, it appears like filebeat isn't able to keep up with the rate of messages being generated and then filebeat closes the files once they are removed by something (docker?) rotating then deleting them.

2022-05-24T13:53:02.904Z	INFO	[input.harvester]	log/harvester.go:332	File was removed. Closing because close_removed is enabled.	{"input_id": "f6919c35-ef1c-4970-89f8-f2a8e5d25b2f", "source": "/var/lib/docker/containers/220fb0a62a25377d0d698e9fd23bd8c1712064d691cc85031a9cd22fca9f5a3f/220fb0a62a25377d0d698e9fd23bd8c1712064d691cc85031a9cd22fca9f5a3f-json.log", "state_id": "native::2760806-2064", "finished": false, "os_id": "2760806-2064", "old_source": "/var/lib/docker/containers/220fb0a62a25377d0d698e9fd23bd8c1712064d691cc85031a9cd22fca9f5a3f/220fb0a62a25377d0d698e9fd23bd8c1712064d691cc85031a9cd22fca9f5a3f-json.log", "old_finished": true, "old_os_id": "2760806-2064", "harvester_id": "22cbe234-9f52-46da-bd86-859092a75748"}

220fb0a62a25377d0d698e9fd23bd8c1712064d691cc85031a9cd22fca9f5a3f is the ID of the container serving the HTTP requests on this node.

I'd like to set close_removed: false but I don't know where to set it as I'm not using the log filebeat.input.

In earlier testing we were running at about 15k HTTP requests per second across the 3 pods, the approximate percentage rate of loss was the same. Very roughly 25-30% loss.
My earlier statement of filebeat not being fast enough was disingenuous. We know it can handle something like 11k messages per second, how it then can't handle 5k messages per second is awful confusing.

The k8s nodes aren't struggling, the Elasticsearch cluster isn't struggling.

We do have the filebeat metrics available and we do have logs from filebeat that "Non-zero metrics in the last 30s".
I don't know what to make of the stats metrics though.
Like, for a pod that has been running for 49 minutes, is this bad?

$ kubectl exec filebeat-filebeat-9bbr7 -n infra-monitoring -- curl -s -XGET 'localhost:5066/stats?pretty' | jq .beat.cgroup.cpu.stats
{
  "periods": 27352,
  "throttled": {
    "ns": 839394825410,
    "periods": 3491
  }
}

periods does grow fast while load testing is happening, and throttled sounds bad.

Any help would be appreciated.

Mike_Williams · May 25, 2022, 10:43am

By changing the filebeat config, and resource limits, as below I was able to get the difference between requests reportedly made to requests in Elasticsearch down to 0.38%.

filebeat.autodiscover:
  providers:
    - type: kubernetes
      node: ${NODE_NAME}
      hints.enabled: true
      appenders:
        - type: config
          config:
            close_removed: false
            clean_removed: false

And the CPU resource limits on the containers increased to 3 (3000m), although 2 would probably have been enough.

At this point I suspect the remaining loss is either in kubernetes or docker.

system · June 22, 2022, 12:43pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat daemonset in Kubernetes is slow (or fails) to harvest logs from multiple pods Beats filebeat	3	1080	February 7, 2023
Filebeat not able to send all logs Beats filebeat	12	2823	January 13, 2021
Filebeat daemonset randomly deletes itself?! Beats filebeat	2	779	February 27, 2018
Filebeat missing end of logs for k8s pods Beats docker , filebeat	3	1492	February 28, 2020
Filebeat stop harvesting new logs Beats filebeat	3	1197	December 17, 2019

Filebeat daemonset losing messages

Related topics