[Filebeat] Filebeat with K8S autodicover using hints keeps refreshing all pod config every 10s


We noticed this issue quite long ago (High CPU Usage on some filebeat instances), but we finally had time to dig a bit more on a more recent version of filebeat too (8.10.1).

From what we recently noticed, out logs keeps showing that filebeat get stop/start events on all pods every 10s. This force a reload of the input from what I see in the code, and while the harvester aren't restarted if there is no change, there is still quite alot of processing.

From my checks it seems that this is the code that keeps being called: https://github.com/elastic/beats/blob/v8.10.1/libbeat/autodiscover/providers/kubernetes/pod.go#L185. However, 99% of our pods aren't updated every 10seconds.

So I checked around a bit more and noticed this condition:
https://github.com/elastic/beats/blob/v8.10.1/libbeat/autodiscover/providers/kubernetes/pod.go#L155 here. From what I understand, if we have a nodewatcher (our scope is on "node", so we do have one), and have either hints enable or metaconf.node enabled, this add a watcher on the pod for every node update.

I did a kubectl get nodes --watch and noticed that the nodes are indeed updated every 10s, which seem to be the source of these repeated updates. This comes from kubelet and the nodeStatusUpdateFrequency values, that default at 10s (Kubelet Configuration (v1beta1) | Kubernetes).

From what I understand, the nodewatcher is mostly there to update the labels/annotations from the nodes to add on the logs, however, these update, update the status field of the node mostly.

The condition seems different in the elastic-agent-discover lib, where only the configvalue of the node metadata and the watcher are checked, not the hint presence. https://github.com/elastic/elastic-agent-autodiscover/blob/main/kubernetes/metadata/metadata.go#L101.

This, make the node metadata watcher unable to be disabled if want to use the hints.

Am I understanding all this correctly, and is this on purpose to react to all these updates? We currently have 4Gb of logs from filebeat only just saying it logs along these lines every 10s

{"log.level":"info","@timestamp":"2023-11-03T08:27:23.011Z","log.logger":"input","log.origin":{"file.name":"log/input.go","file.line":174},"message":"Configured paths: [`

As we do not use the node metadata, is there a way to fully disabled that without disabling the hints? Or do we need to workaround and move our few templates defined from hints in the main configuration and load them differently?


Small addition,

I did a rebuild of the filebeat binary removing the hints check from the condition + disabling that node metadata, and on a more busy cluster i'm going from a usage of 90-95% CPU (reported in Elastic Beat monitoring pannels) to a 14% usage. I know my "fix" is probably breaking stuff on other beats, but just to how the impact it has on our cluster.

I am probably missing the reason on why the hints are checked there even when we decide to disable these meta at the input level, but this lead to a quite huge overhead in our cluster at least.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.