Filebeat autodiscover mode is flooding my kubernetes API

NeVraX · May 17, 2023, 10:31am

Hello,

I'm working on a managed kubernetes cluster with a cloud provider which offers limited K8S API performance (slow master nodes).

I have installed filebeat 8.5.1 with the official Helm chart.

They say that my filebeat configuration (deployed inside the cluster) relies too much on K8S API calls : it is flooding the API servers and admin task have response time problem. Even if there are only 3 worker nodes and relatively few logs (the app is not yet in production).

I already lowered the pressure with max_procs=1 and kube_client_options.qps=1. But this is not enough.

I don't really know how to go further, so I have a few questions :

are there some design error in this below filebeat.yml config file ?
do I risk loosing messages with kube_client_options.qps=1, or just losing additional kubernetes fields (which I don't care that much) ? Will I have log highlighting this problem if any ?
is autodiscover mandatory to achieve the filtering I'm trying to get ? If not, what is the legacy alternative ?
are there better architecture choices that could lower K8S API pressure, for example using logstash and filtering there ?

Many thanks in advance for any input

    max_procs: 1 # default = virtual CPU count (8 in production, too much for OVH API servers)
    filebeat.autodiscover:
      providers:
        - type: kubernetes
          kube_client_options:
            qps: 1
            burst: 10
          templates:
            - condition:
                or:
                  - equals.kubernetes.container.name: server # all produced myapp containers
                  - equals.kubernetes.container.name: teleport # same message start
              config:
                - type: container
                  paths:
                    - /var/log/containers/*-${data.kubernetes.container.id}.log
                  multiline:
                    pattern: "^[0-9]{4}-[0-9]{2}-[0-9]{2}" # starts with our date pattern
                    negate: true
                    match: after
                  processors:
                    - add_kubernetes_metadata:
                        host: ${NODE_NAME}
                        matchers:
                          - logs_path:
                              logs_path: "/var/log/containers/"
                    - dissect:
                        # searchable fields are defined here : https://www.elastic.co/guide/en/ecs/8.7/ecs-field-reference.html
                        # 2023-04-13T09:44:59.013Z  INFO user[1f4ba76d-c76f-4d4f-bd91-453b5313708d] 1 --- [io-8080-exec-10] c.p.b.m.t.a.LyraMessageBuilder: Start LyraMessageBuilder.build(..)
                        tokenizer: "%{event.start} %{log.level} user[%{process.real_user.id}] %{log.syslog.msgid} --- [%{log.syslog.procid}] %{log.origin.function}: %{event.reason}"
                        field: "message"
                        target_prefix: ""
                        ignore_failure: true
                        overwrite_keys: true
                        trim_values: "all" # values are trimmed for leading and trailing
            - condition: # other non my-app pods
                and:
                  - not.equals.kubernetes.container.name: apm-server
                  - not.equals.kubernetes.container.name: autoscaler # kube-dns-autoscaler
                  - not.equals.kubernetes.container.name: aws-cluster-autoscaler
                  - not.equals.kubernetes.container.name: calico-node
                  - not.equals.kubernetes.container.name: cert-manager-webhook-ovh
                  - not.equals.kubernetes.container.name: coredns
                  - not.equals.kubernetes.container.name: csi-snapshotter
                  - not.equals.kubernetes.container.name: filebeat
                  - not.equals.kubernetes.container.name: ingress-nginx-default-backend
                  - not.equals.kubernetes.container.name: logstash
                  - not.equals.kubernetes.container.name: metricbeat
                  - not.equals.kubernetes.container.name: pgadmin4
                  - not.equals.kubernetes.container.name: server # <-- above condition
                  - not.equals.kubernetes.container.name: teleport # <-- above condition
                  - not.equals.kubernetes.container.name: wormhole
              config:
                - type: container
                  paths:
                    - /var/log/containers/*-${data.kubernetes.container.id}.log
                  processors:
                    - add_kubernetes_metadata:
                        host: ${NODE_NAME}
                        matchers:
                          - logs_path:
                              logs_path: "/var/log/containers/"

NeVraX · May 18, 2023, 6:52am

Thank you for your time.

This a 3 worker nodes cluster with 15GB RAM each, so a very modest cluster.

These container names matches one pod per node for most of them. For the 'server' name, it is more, like 5 pods per node, at most.

Another question on top of the others : can I measure the K8S API pressure in some way ? Like by elevating the log level or something ?

bdols · May 18, 2023, 2:33pm

yeah, for API usage, it's possible, I have a API server metrics dashboard in grafana that came with kube-prometheus-stack (helm-charts/apiserver.yaml at main · prometheus-community/helm-charts · GitHub).
and metricbeat has a module for apiserver in kubernetes that I haven't evaluated before: Kubernetes module | Metricbeat Reference [8.7] | Elastic

NeVraX · May 19, 2023, 1:30pm

Thanks Brian, I'll look into that, it will take some time and I'm confident that this will show me the actual pressure. But not solve my problem, since the admins told me specifically that 98% of calls come from the filebeat pods.

Back to my initial questions, what could be improved to lower Kubernetes API pressure ?

NeVraX · May 30, 2023, 8:48am

Would the scope field help ? documentation says values are node or cluster.

Is it cluster by default, even if resource is pod by default ?

system · June 27, 2023, 10:48am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Autodiscover issue in kubernetes Beats filebeat	7	3483	July 7, 2020
Filebeat k8s - autodiscover for podman.sock not working Beats docker , filebeat	1	1117	December 31, 2021
Filebeat Autodiscover on Kubernetes not working Beats filebeat	5	603	March 13, 2024
Filebeat how to autodiscover kubernetes service and fetch pod log? Beats filebeat	4	1851	July 20, 2020
Limiting Filebeat Autodiscover to Namespace Beats filebeat	2	2060	June 26, 2020

Filebeat autodiscover mode is flooding my kubernetes API

Related topics