Elastic Agent with Kubernetes integration conflicting with System integration

Hello,
I have installed a Fleet Server and I am using Elastic Agent 8.15.5 as Daemonset on Kubernetes.

In my Agent Policy I have added system & kubernetes integration together but not all Kubernetes metrics are coming in the dashboard of Kubernetes (nodes, namespace, daemonset, etc are not coming).
If I remove "system" integration, then everything will work as expected in the Kubernetes dashboards.

In the agent info I can see these errors when I use both integrations together:

Unit state changed kubernetes/metrics-default-kubernetes/metrics-kubelet-xxxxxx (STARTING->FAILED): Failed: pid 'xxxxxx' exited with code '-1'

Unit state changed kubernetes/metrics-default-kubernetes/metrics-events-xxxxxx (STARTING->FAILED): Failed: pid 'xxxxxx' exited with code '-1'

Unit state changed kubernetes/metrics-default-kubernetes/metrics-kube-apiserver-xxxxxx (STARTING->FAILED): Failed: pid 'xxxxxx' exited with code '-1'

Unit state changed kubernetes/metrics-default-kubernetes/metrics-kube-state-metrics-xxxxxx (STARTING->FAILED): Failed: pid 'xxxxxx' exited with code '-1'

Unit state changed kubernetes/metrics-default (HEALTHY->FAILED): Failed: pid 'xxxxxx' exited with code '-1'

Component state changed kubernetes/metrics-default (HEALTHY->FAILED): Failed: pid 'xxxxxx' exited with code '-1'

Do you know if any conflicts on using both together? How I can monitor System of my virtual machines where I have Kubernetes?

Thanks in advance!

Can you check the memory usage of the pod? Is it hitting the memory limit in the manifest?

1 Like

Yes, is hitting the memory limit and CPU limit as well.
So the Elastic Agent as DaemonSet in Kubernetes need all these resources? It's 300m CPU and 700MB memory for each pod in the nodes.
I have migrated from Filebeat but it doesn't need all these resources.

Do you have any idea?

We have continued to improve memory consumption on Kubernetes and would recommend moving to 8.17 if possible.

Memory consumption can depend on the number of pods, tasks, what integrations you have configured, what data you are pulling, and if you have modified the output settings.

It working with the system integration disabled might mean you only need to increase the limit by ~150 or 200mb to get it working again.