MetricBeat Pod, Container and Volume Metricset for Kubernetes Module not working

Hey there!

I've been working on setting up Metricbeat within a local Kubernetes cluster to collect metrics from a microservices system, specifically from the onlineboutique application. My Kubernetes cluster is running on Rancher Desktop, and I'm deploying applications using Skaffold. I've also deployed Kibana for visualization, with Metricbeat configured to forward metrics to it.

After deploying Metricbeat as a DaemonSet to gather metrics across the cluster, I noticed that only system metrics are being collected successfully. Unfortunately, the pod and container metricsets are not working as expected. Whenever I run /metricbeat test modules, the system module reports back as OK, but pod , container, and 'volumes' modules report ERROR timeout waiting for an event.

  • Metricbeat Version: 8.13.0
  • Kubernetes Version: 1.28.7
  • Rancher Desktop Version: 1.13.1
  • Deployment Method: Skaffold

Steps Already Taken:

  • Ensured Metricbeat has broad permissions through a ClusterRole with get, list, watch verbs across all resources.
  • Checked Metricbeat, Kubelet, and Kubernetes API server logs but didn't find clear indicators of the issue.

I suspect the issue might be related to connectivity with the Kubelet API or perhaps a specific configuration requirement I might be missing for pod and container metrics collection. I've attached the relevant parts of my configuration files for reference.

Could anyone provide insights or suggestions on what might be causing this issue and how to resolve it? Since I am fairly new to the Elastic Stack and Kubernetes Game any guidance or recommendations would be greatly appreciated!

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: metricbeat
  namespace: logging
  labels:
    k8s-app: metricbeat
spec:
  selector:
    matchLabels:
      k8s-app: metricbeat
  template:
    metadata:
      labels:
        k8s-app: metricbeat
    spec:
      serviceAccountName: metricbeat
      containers:
        - name: metricbeat
          image: docker.elastic.co/beats/metricbeat:8.13.0
          args: [
            "-c", "/etc/metricbeat.yml",
            "-e",
          ]
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          securityContext:
            runAsUser: 0
          resources:
            limits:
              memory: 512Mi
            requests:
              cpu: 100m
              memory: 256Mi
          volumeMounts:
            - name: config
              mountPath: /etc/metricbeat.yml
              readOnly: true
              subPath: metricbeat.yml
            - name: dockersock
              mountPath: /var/run/docker.sock
            - name: proc
              mountPath: /host/proc
              readOnly: true
            - name: cgroup
              mountPath: /host/sys/fs/cgroup
              readOnly: true
            - name: modules
              mountPath: /usr/share/metricbeat/modules.d
              readOnly: true
      volumes:
        - name: config
          configMap:
            defaultMode: 0600
            name: metricbeat-config
        - name: proc
          hostPath:
            path: /proc
        - name: cgroup
          hostPath:
            path: /sys/fs/cgroup
        - name: dockersock
          hostPath:
            path: /var/run/docker.sock
        - name: modules
          configMap:
            defaultMode: 0600
            name: metricbeat-modules-config


----


apiVersion: v1
kind: ConfigMap
metadata:
  name: metricbeat-modules-config
  namespace: logging
  labels:
    k8s-app: metricbeat
data:
  kubernetes.yml: |-
    - module: kubernetes
      metricsets: ["pod", "container", "system", "volume", "apiserver"]
      period: 5s
      hosts: ["https://${NODE_NAME}:10250"]
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      ssl.verification_mode: "none"
      in_cluster: true
      add_metadata: true

----

apiVersion: v1
kind: ConfigMap
metadata:
  name: metricbeat-config
  namespace: logging
data:
  metricbeat.yml: |
    metricbeat.config.modules:
      path: ${path.config}/modules.d/*.yml
      reload.enabled: false
    setup.template.settings:
      index.number_of_shards: 1
      index.codec: best_compression
    metricbeat.autodiscover:
      providers:
      - type: kubernetes
        scope: cluster
        node: ${NODE_NAME}
        hints.enabled: false
        templates:
          - config:
              - module: kubernetes
                metricsets: ["pod", "container", "system", "volume", "apiserver"]
                period: 5s
                hosts: ["https://${NODE_NAME}:10250"]
                bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                ssl.verification_mode: "none"
                in_cluster: true
                add_metadata: true
    output.elasticsearch:
      hosts: [ "elasticsearch.logging:9200" ]


----

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: metricbeat
rules:
  - apiGroups: ["*"]
    resources: ["*"]
    verbs: ["get", "list", "watch"]
  - nonResourceURLs: ["*"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [ "coordination.k8s.io" ]
    resources: [ "leases" ]
    verbs: [ "get", "list", "watch", "create", "update", "patch", "delete" ]

When testing the modules:


./metricbeat test modules -e
{"log.level":"info","@timestamp":"2024-04-06T11:09:35.445Z","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/cmd/instance.(*Beat).configure","file.name":"instance/beat.go","file.line":811},"message":"Home path: [/usr/share/metricbeat] Config path: [/usr/share/metricbeat] Data path: [/usr/share/metricbeat/data] Logs path: [/usr/share/metricbeat/logs]","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-04-06T11:09:35.445Z","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/cmd/instance.(*Beat).configure","file.name":"instance/beat.go","file.line":819},"message":"Beat ID: a36bcdf9-7ac6-438f-8a28-668ca0b8a5ef","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-04-06T11:09:35.448Z","log.logger":"add_cloud_metadata","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/processors/add_cloud_metadata.(*addCloudMetadata).fetchMetadata","file.name":"add_cloud_metadata/providers.go","file.line":173},"message":"add_cloud_metadata: received error failed requesting digitalocean metadata: Get \"http://169.254.169.254/metadata/v1.json\": dial tcp 169.254.169.254:80: connect: connection refused","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-04-06T11:09:35.449Z","log.logger":"add_cloud_metadata","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/processors/add_cloud_metadata.(*addCloudMetadata).fetchMetadata","file.name":"add_cloud_metadata/providers.go","file.line":173},"message":"add_cloud_metadata: received error failed requesting azure metadata: Get \"http://169.254.169.254/metadata/instance/compute?api-version=2021-02-01\": dial tcp 169.254.169.254:80: connect: connection refused","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-04-06T11:09:35.450Z","log.logger":"add_cloud_metadata","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/processors/add_cloud_metadata.(*addCloudMetadata).fetchMetadata","file.name":"add_cloud_metadata/providers.go","file.line":173},"message":"add_cloud_metadata: received error failed requesting openstack metadata: Get \"http://169.254.169.254/2009-04-04/meta-data/instance-id\": dial tcp 169.254.169.254:80: connect: connection refused","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-04-06T11:09:35.451Z","log.logger":"add_cloud_metadata","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/processors/add_cloud_metadata.(*addCloudMetadata).fetchMetadata","file.name":"add_cloud_metadata/providers.go","file.line":173},"message":"add_cloud_metadata: received error failed requesting gcp metadata: Get \"http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json\": dial tcp 169.254.169.254:80: connect: connection refused","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-04-06T11:09:35.452Z","log.logger":"add_cloud_metadata","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/processors/add_cloud_metadata.(*addCloudMetadata).fetchMetadata","file.name":"add_cloud_metadata/providers.go","file.line":173},"message":"add_cloud_metadata: received error failed requesting openstack metadata: Get \"https://169.254.169.254/2009-04-04/meta-data/placement/availability-zone\": dial tcp 169.254.169.254:443: connect: connection refused","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-04-06T11:09:35.453Z","log.logger":"add_cloud_metadata","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/processors/add_cloud_metadata.(*addCloudMetad
ata).fetchMetadata","file.name":"add_cloud_metadata/providers.go","file.line":173},"message":"add_cloud_metadata: received error failed requesting hetzner metadata: Get \"http://169.254.169.254/hetzner/v1/metadata/availability-zone\": dial tcp 169.254.169.254:80: connect: connection refused","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-04-06T11:09:35.481Z","log.logger":"tls","log.origin":{"function":"github.com/elastic/elastic-agent-libs/transport/tlscommon.(*TLSConfig).ToConfig","file.name":"tlscommon/tls_config.go","file.line":107},"message":"SSL/TLS verifications disabled.","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-04-06T11:09:35.486Z","log.logger":"kubernetes","log.origin":{"function":"github.com/elastic/elastic-agent-autodiscover/kubernetes.DiscoverKubernetesNode","file.name":"kubernetes/util.go","file.line":130},"message":"kubernetes: Node pf4bwj30 discovered by in cluster pod node query","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-04-06T11:09:35.495Z","log.logger":"tls","log.origin":{"function":"github.com/elastic/elastic-agent-libs/transport/tlscommon.(*TLSConfig).ToConfig","file.name":"tlscommon/tls_config.go","file.line":107},"message":"SSL/TLS verifications disabled.","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-04-06T11:09:35.497Z","log.logger":"kubernetes","log.origin":{"function":"github.com/elastic/elastic-agent-autodiscover/kubernetes.DiscoverKubernetesNode","file.name":"kubernetes/util.go","file.line":130},"message":"kubernetes: Node pf4bwj30 discovered by in cluster pod node query","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-04-06T11:09:35.502Z","log.logger":"tls","log.origin":{"function":"github.com/elastic/elastic-agent-libs/transport/tlscommon.(*TLSConfig).ToConfig","file.name":"tlscommon/tls_config.go","file.line":107},"message":"SSL/TLS verifications disabled.","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-04-06T11:09:35.503Z","log.origin":{"function":"github.com/elastic/beats/v7/metricbeat/module/kubernetes/util.AddClusterECSMeta","file.name":"util/kubernetes.go","file.line":745},"message":"could not retrieve cluster metadata: fail to get kubernetes cluster metadata: unable to retrieve cluster identifiers","service.name":"metricbeat","ecs.version":"1.6.0"}        
{"log.level":"warn","@timestamp":"2024-04-06T11:09:35.503Z","log.logger":"tls","log.origin":{"function":"github.com/elastic/elastic-agent-libs/transport/tlscommon.(*TLSConfig).ToConfig","file.name":"tlscommon/tls_config.go","file.line":107},"message":"SSL/TLS verifications disabled.","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-04-06T11:09:35.505Z","log.origin":{"function":"github.com/elastic/beats/v7/metricbeat/module/kubernetes/util.AddClusterECSMeta","file.name":"util/kubernetes.go","file.line":745},"message":"could not retrieve cluster metadata: fail to get kubernetes cluster metadata: unable to retrieve cluster identifiers","service.name":"metricbeat","ecs.version":"1.6.0"}        
{"log.level":"warn","@timestamp":"2024-04-06T11:09:35.505Z","log.logger":"tls","log.origin":{"function":"github.com/elastic/elastic-agent-libs/transport/tlscommon.(*TLSConfig).ToConfig","file.name":"tlscommon/tls_config.go","file.line":107},"message":"SSL/TLS verifications disabled.","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-04-06T11:09:35.507Z","log.origin":{"function":"github.com/elastic/beats/v7/metricbeat/module/kubernetes/util.AddClusterECSMeta","file.name":"util/kubernetes.go","file.line":745},"message":"could not retrieve cluster metadata: fail to get kubernetes cluster metadata: unable to retrieve cluster identifiers","service.name":"metricbeat","ecs.version":"1.6.0"}        
kubernetes...
  pod...{"log.level":"warn","@timestamp":"2024-04-06T11:09:35.809Z","log.logger":"tls","log.origin":{"function":"github.com/elastic/elastic-agent-libs/transport/tlscommon.(*TLSConfig).ToConfig","file.name":"tlscommon/tls_config.go","file.line":107},"message":"SSL/TLS verifications disabled.","service.name":"metricbeat","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-04-06T11:09:38.446Z","log.logger":"add_cloud_metadata","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/processors/add_cloud_metadata.(*addCloudMetada
ta).init.func1","file.name":"add_cloud_metadata/add_cloud_metadata.go","file.line":100},"message":"add_cloud_metadata: hosting provider type not detected.","service.name":"metricbeat","ecs.version":"1.6.0"}

    error... ERROR timeout waiting for an event
  container...
    error... ERROR timeout waiting for an event
  system...OK
    result:
    {
     "@timestamp": "2024-04-06T11:09:45.509Z",
     "event": {
      "dataset": "kubernetes.system",
      "duration": 130385,
      "module": "kubernetes"
     },
     "kubernetes": {
      "node": {
       "name": "pf4bwj30"
      },
      "system": {
       "container": "kubelet",
       "cpu": {
        "usage": {
         "core": {
          "ns": 15900525606776
         },
         "nanocores": 326532903
        }
       },
       "memory": {
        "majorpagefaults": 6681,
        "pagefaults": 37260172,
        "rss": {
         "bytes": 22427107328
        },
        "usage": {
         "bytes": 24825720832
        },
        "workingset": {
         "bytes": 24326336512
        }
       },
       "start_time": "2024-04-05T16:35:30Z"
      }
     },
     "metricset": {
      "name": "system",
      "period": 20000
     },
     "service": {
      "address": "https://pf4bwj30:10250/stats/summary",
      "type": "kubernetes"
     }
    }

  volume...
    error... ERROR timeout waiting for an event
  apiserver...{"log.level":"warn","@timestamp":"2024-04-06T11:09:50.510Z","log.logger":"tls","log.origin":{"function":"github.com/elastic/elastic-agent-libs/transport/tlscommon.(*TLSConfig).ToConfig","file.name":"tlscommon/tls_config.go","file.line":107},"message":"SSL/TLS verifications disabled.","service.name":"metricbeat","ecs.version":"1.6.0"}
OK
    result:
    {
     "@timestamp": "2024-04-06T11:09:50.510Z",
     "event": {
      "dataset": "kubernetes.apiserver",
      "duration": 146790871,
      "module": "kubernetes"
     },
     "kubernetes": {
      "apiserver": {
       "major": {
        "version": "1"
       },
       "minor": {
        "version": "28"
       },
       "request": {
        "code": "200",
        "component": "apiserver",
        "count": 4,
        "group": "apiextensions.k8s.io",
        "resource": "customresourcedefinitions",
        "scope": "resource",
        "verb": "GET",
        "version": "v1"
       }
      }
     },
     "metricset": {
      "name": "apiserver",
      "period": 20000
     },
     "service": {
      "address": "https://pf4bwj30:10250/metrics",
      "type": "kubernetes"
     }
    }

Hi @KarwendelMann Welcome to the community.

I would exec into a pod and then try to curl the kubelet endpoints to figure out what is going on. My first question is are you sure it is running in https? did you try http?

Can you just try the default recommended config from here

Specifically from

https://raw.githubusercontent.com/elastic/beats/8.13/deploy/kubernetes/metricbeat-kubernetes.yaml

You can comment out the kube-state-metrics stuff

Also per

Metrics in Kubernetes

In most cases metrics are available on /metrics endpoint of the HTTP server. For components that don't expose endpoint by default, it can be enabled using --bind-address flag.

what do you get

curl --insecure -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://pf4bwj30:10250/metrics

Thanks for your answer!

I tested the endpoint /metrics, but unfortunately im getting another error:

./metricbeat test modules
kubernetes...
  pod...
    error... ERROR cannot unmarshal json response: invalid character '#' looking for beginning of value

When querying the endpoint manually i noticed these # lines in the response:

node_collector_update_node_health_duration_seconds_count 5050
# HELP node_collector_zone_health [ALPHA] Gauge measuring percentage of healthy nodes per zone.
# TYPE node_collector_zone_health gauge
node_collector_zone_health{zone=""} 100

Can i configure metricbeat for a key-value response and to handle the # lines? Also, can you recommend which endpoint to use? In the link you sent, there is the /metrics and the /metrics/cadvisor endpoint, but im unsure which to use.

I also tested the /metrics/cadvisor and the config from your link but both lead to the same error.

Those are prometheus style metrics...