Hello,
I'm working on a managed kubernetes cluster with a cloud provider which offers limited K8S API performance (slow master nodes).
I have installed filebeat 8.5.1 with the official Helm chart.
They say that my filebeat configuration (deployed inside the cluster) relies too much on K8S API calls : it is flooding the API servers and admin task have response time problem. Even if there are only 3 worker nodes and relatively few logs (the app is not yet in production).
I already lowered the pressure with max_procs=1
and kube_client_options.qps=1
. But this is not enough.
I don't really know how to go further, so I have a few questions :
- are there some design error in this below
filebeat.yml
config file ? - do I risk loosing messages with
kube_client_options.qps=1
, or just losing additional kubernetes fields (which I don't care that much) ? Will I have log highlighting this problem if any ? - is autodiscover mandatory to achieve the filtering I'm trying to get ? If not, what is the legacy alternative ?
- are there better architecture choices that could lower K8S API pressure, for example using logstash and filtering there ?
Many thanks in advance for any input
max_procs: 1 # default = virtual CPU count (8 in production, too much for OVH API servers)
filebeat.autodiscover:
providers:
- type: kubernetes
kube_client_options:
qps: 1
burst: 10
templates:
- condition:
or:
- equals.kubernetes.container.name: server # all produced myapp containers
- equals.kubernetes.container.name: teleport # same message start
config:
- type: container
paths:
- /var/log/containers/*-${data.kubernetes.container.id}.log
multiline:
pattern: "^[0-9]{4}-[0-9]{2}-[0-9]{2}" # starts with our date pattern
negate: true
match: after
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
- dissect:
# searchable fields are defined here : https://www.elastic.co/guide/en/ecs/8.7/ecs-field-reference.html
# 2023-04-13T09:44:59.013Z INFO user[1f4ba76d-c76f-4d4f-bd91-453b5313708d] 1 --- [io-8080-exec-10] c.p.b.m.t.a.LyraMessageBuilder: Start LyraMessageBuilder.build(..)
tokenizer: "%{event.start} %{log.level} user[%{process.real_user.id}] %{log.syslog.msgid} --- [%{log.syslog.procid}] %{log.origin.function}: %{event.reason}"
field: "message"
target_prefix: ""
ignore_failure: true
overwrite_keys: true
trim_values: "all" # values are trimmed for leading and trailing
- condition: # other non my-app pods
and:
- not.equals.kubernetes.container.name: apm-server
- not.equals.kubernetes.container.name: autoscaler # kube-dns-autoscaler
- not.equals.kubernetes.container.name: aws-cluster-autoscaler
- not.equals.kubernetes.container.name: calico-node
- not.equals.kubernetes.container.name: cert-manager-webhook-ovh
- not.equals.kubernetes.container.name: coredns
- not.equals.kubernetes.container.name: csi-snapshotter
- not.equals.kubernetes.container.name: filebeat
- not.equals.kubernetes.container.name: ingress-nginx-default-backend
- not.equals.kubernetes.container.name: logstash
- not.equals.kubernetes.container.name: metricbeat
- not.equals.kubernetes.container.name: pgadmin4
- not.equals.kubernetes.container.name: server # <-- above condition
- not.equals.kubernetes.container.name: teleport # <-- above condition
- not.equals.kubernetes.container.name: wormhole
config:
- type: container
paths:
- /var/log/containers/*-${data.kubernetes.container.id}.log
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"