Trouble debugging MetricBeat connection issues

Hi all,

I'm running an ECK on a Kubernetes cluster and am using FileBeat and MetricBeat to send monitoring information to Elastic.
For the most part, things are running smoothly, except for the MetricBeat Deployment, which is not providing any output data (the DaemonSet works fine).

I've confirmed that the KubeStateMetrics pod is up and running and gathering data as expected, so I assume that either MetricBeat has trouble connecting to KubeStateMetrics or to Elastic.
The logs for the MetricBeat pod shows no connection errors however.

In hopes of isolating the issue, I've changed the 'Host' properties of the Metricbeat deployment for both the Elastic and Kube-State-Metrics services to non-existent endpoints and changed the auth credentials, to see if that would throw any useful errors.

I have tried (in separate steps):

  • Invalid host for Kube-State-Metrics
  • Invalid host for Elastic
  • Invalid username/password for Elastic
  • Invalid certificate for Elastic

In all cases, no errors are shown at all. Activating 'debug' logging also yields no useful information.
The Metricbeat logs simply shows the 'Non-zero metrics collected' INFO messages.
Seemingly it does not matter whether I provide valid or invalid hosts or credentials.

This strikes me as unexpected behavior, and makes me think I'm overlooking something.
Can someone verify that under normal circumstances, connection errors should be shown?

And are there further steps I can take to debug/isolate the problem?

Deployment config:

apiVersion: v1
kind: ConfigMap
metadata:
  name: metricbeat-deployment-config
  namespace: elastic
  labels:
    k8s-app: metricbeat
data:
  metricbeat.yml: |-
    metricbeat.config.modules:
      # Reload module configs as they change:
      reload.enabled: false
    processors:
      - add_cloud_metadata:
      - add_kubernetes_metadata:
         in_cluster: true
    setup.ilm.enabled: false
    output.elasticsearch:
      hosts: ['https://elastic-es-http:9200']
      ssl.certificate_authorities: ["/usr/share/elastic/certs/ca.crt"]
      ssl.certificate: '/usr/share/elastic/certs/tls.crt'
      ssl.key: '/usr/share/elastic/certs/tls.key'
      username: '{username}'
      password: "{password}"
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: metricbeat-deployment-modules
  namespace: elastic
  labels:
    k8s-app: metricbeat
data:
  # This module requires `kube-state-metrics` up and running under `kube-system` namespace
  kubernetes.yml: |-
    - module: kubernetes
      labels.dedot: true
      annotations.dedot: true
      metricsets:
        - state_node
        - state_deployment
        - state_replicaset
        - state_pod
        - state_container
        - state_statefulset
        # Uncomment this to get k8s events:
        - event
      period: 10s
      hosts: ["kube-state-metrics.kube-system.svc.cluster.local:8080"]
      add_metadata: true
      in_cluster: true
      enabled: true
---
# Deploy singleton instance in the whole cluster for some unique data sources, like kube-state-metrics
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: metricbeat
  namespace: elastic
  labels:
    k8s-app: metricbeat
spec:
 template:
    metadata:
      creationTimestamp: ~
      labels:
        k8s-app: metricbeat
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: agentpool
                    operator: In
                    values:
                      - elastic
      containers:
        - args:
            - "-c"
            - /etc/metricbeat.yml
            - "-e"
          image: "docker.elastic.co/beats/metricbeat-oss:7.5.0"
          imagePullPolicy: IfNotPresent
          name: metricbeat
          resources:
            limits:
              memory: 200Mi
            requests:
              cpu: 100m
              memory: 100Mi
          securityContext:
            runAsUser: 0
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /usr/share/elastic/certs/
              name: elastic-internal-http-certificates
              readOnly: true
            - mountPath: /etc/metricbeat.yml
              name: config
              readOnly: true
              subPath: metricbeat.yml
            - mountPath: /usr/share/metricbeat/modules.d
              name: modules
              readOnly: true
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: metricbeat
      serviceAccountName: metricbeat
      terminationGracePeriodSeconds: 30
      tolerations:
        - effect: NoSchedule
          key: restriction
          operator: Equal
          value: elastic
      volumes:
        - configMap:
            defaultMode: 384
            name: metricbeat-deployment-config
          name: config
        - name: elastic-internal-http-certificates
          secret:
            defaultMode: 420
            optional: false
            secretName: elastic-es-http-certs-internal
        - configMap:
            defaultMode: 384
            name: metricbeat-deployment-modules
          name: modules

same output you have at the filebeat to reach elasticsearch should be ok at metricbeat.

then, can you bash into the metricbeat pod, install curl and try

curl http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics

If that succeeds, can you restart the metricbeat pod and check if there are any errors at startup? kubernetes metricsets use the API to "enrich" reported events, an issue contacting the API from the metricbeat pod could lead to no events being reported.

Thanks for the response! When I curl to the metrics endpoint as you suggest, I get the overview of metrics as expected. All seems to be in order on that end.
Restarting the pod gives the following startup log:

2020-01-08T12:43:07.216Z	INFO	instance/beat.go:610	Home path: [/usr/share/metricbeat] Config path: [/usr/share/metricbeat] Data path: [/usr/share/metricbeat/data] Logs path: [/usr/share/metricbeat/logs]
2020-01-08T12:43:07.267Z	INFO	instance/beat.go:618	Beat ID: 1d8a1e00-f468-401c-818e-f73d5ed1dd5b
2020-01-08T12:43:07.275Z	INFO	add_cloud_metadata/add_cloud_metadata.go:93	add_cloud_metadata: hosting provider type detected as az, metadata={"instance":{"id":"3e379e50-c668-4cbf-ab13-41a432efab23","name":"aks-elastic-29012786-vmss_0"},"machine":{"type":"Standard_D2_v2"},"provider":"az","region":"westeurope"}
2020-01-08T12:43:07.289Z	INFO	add_kubernetes_metadata/kubernetes.go:68	add_kubernetes_metadata: kubernetes env detected, with version: v1.14.8
2020-01-08T12:43:07.289Z	INFO	kubernetes/util.go:94	kubernetes: Using pod name metricbeat-7dcd6cf656-fnktg and namespace elastic to discover kubernetes node
2020-01-08T12:43:07.303Z	INFO	kubernetes/util.go:100	kubernetes: Using node aks-elastic-29012786-vmss000000 discovered by in cluster pod node query
2020-01-08T12:43:07.406Z	INFO	[seccomp]	seccomp/seccomp.go:124	Syscall filter successfully installed
2020-01-08T12:43:07.407Z	INFO	[beat]	instance/beat.go:941	Beat info	{"system_info": {"beat": {"path": {"config": "/usr/share/metricbeat", "data": "/usr/share/metricbeat/data", "home": "/usr/share/metricbeat", "logs": "/usr/share/metricbeat/logs"}, "type": "metricbeat", "uuid": "1d8a1e00-f468-401c-818e-f73d5ed1dd5b"}}}
2020-01-08T12:43:07.407Z	INFO	[beat]	instance/beat.go:950	Build info	{"system_info": {"build": {"commit": "6d0d0ae079e5cb1d4f224801ac6df926dfb1594c", "libbeat": "7.5.0", "time": "2019-11-25T23:47:32.000Z", "version": "7.5.0"}}}
2020-01-08T12:43:07.407Z	INFO	[beat]	instance/beat.go:953	Go runtime info	{"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":2,"version":"go1.12.12"}}}
2020-01-08T12:43:07.410Z	INFO	[beat]	instance/beat.go:957	Host info	{"system_info": {"host": {"architecture":"x86_64","boot_time":"2019-11-18T07:37:39Z","containerized":false,"name":"metricbeat-7dcd6cf656-fnktg","ip":["127.0.0.1/8","10.244.2.75/24"],"kernel_version":"4.15.0-1060-azure","mac":["2e:6d:66:14:37:45"],"os":{"family":"redhat","platform":"centos","name":"CentOS Linux","version":"7 (Core)","major":7,"minor":7,"patch":1908,"codename":"Core"},"timezone":"UTC","timezone_offset_sec":0}}}
2020-01-08T12:43:07.410Z	INFO	[beat]	instance/beat.go:986	Process info	{"system_info": {"process": {"capabilities": {"inheritable":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"permitted":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"effective":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"bounding":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"ambient":null}, "cwd": "/usr/share/metricbeat", "exe": "/usr/share/metricbeat/metricbeat", "name": "metricbeat", "pid": 1, "ppid": 0, "seccomp": {"mode":"filter","no_new_privs":true}, "start_time": "2020-01-08T12:43:05.820Z"}}}
2020-01-08T12:43:07.410Z	INFO	instance/beat.go:297	Setup Beat: metricbeat; Version: 7.5.0
2020-01-08T12:43:07.411Z	INFO	elasticsearch/client.go:171	Elasticsearch url: https://elastic-es-http:9200
2020-01-08T12:43:07.411Z	INFO	[publisher]	pipeline/module.go:97	Beat name: metricbeat-7dcd6cf656-fnktg
2020-01-08T12:43:07.411Z	INFO	[monitoring]	log/log.go:118	Starting metrics logging every 30s
2020-01-08T12:43:07.411Z	INFO	instance/beat.go:429	metricbeat start running.
2020-01-08T12:43:07.412Z	INFO	cfgfile/reload.go:171	Config reloader started
2020-01-08T12:43:07.412Z	INFO	cfgfile/reload.go:226	Loading of config files completed.
2020-01-08T12:43:37.413Z	INFO	[monitoring]	log/log.go:145	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":50,"time":{"ms":54}},"total":{"ticks":110,"time":{"ms":123},"value":0},"user":{"ticks":60,"time":{"ms":69}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":8},"info":{"ephemeral_id":"7c7068f6-cafb-4049-9022-b3e4c22fccb7","uptime":{"ms":30238}},"memstats":{"gc_next":9704816,"memory_alloc":7072720,"memory_total":15775304,"rss":56057856},"runtime":{"goroutines":30}},"libbeat":{"config":{"module":{"running":0},"reloads":1},"output":{"type":"elasticsearch"},"pipeline":{"clients":0,"events":{"active":0}}},"system":{"cpu":{"cores":2},"load":{"1":1.57,"15":0.88,"5":1.19,"norm":{"1":0.785,"15":0.44,"5":0.595}}}}}}

From there it continues to output the 'Non-zero metrics' lines at 30 second intervals. No errors or warnings that I can see.

`

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.