Metricbeat 7.16.2
Hi,
I'm trying to scrape prometheus metrics of multiple pods on different nodes from the same metricbeat pod.
Actually metricbeat is deployed as a daemonset and prometheus collection is done on node scope, each metricbeat collects the metrics of the pods in the same node.
Used annotations:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
Actual configuration:
config:
metricbeat:
autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
templates:
- condition.equals:
kubernetes.annotations.prometheus.io/scrape: "true"
config:
- module: prometheus
metricsets: ['collector']
enabled: true
period: 30s
hosts: ["${data.kubernetes.pod.ip}:${data.kubernetes.annotations.prometheus.io/port}"]
metrics_path: "${data.kubernetes.annotations.prometheus.io/path}"
use_types: true
rate_counters: false
The issue here is that every single metricbeat will start prometheus module collection at a different timestamp from the others (The collection period is the same: 30s) which complicates dashboards creation on Kibana
My goal is to remove the timestamp gap in the events generated from metricbeat by collecting prometheus metrics only from one pod.
To do so I tried the options 'unique: true' and 'scope: cluster' in the autodiscovery settings:
config:
metricbeat:
autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
scope: cluster
unique: true
templates:
- condition.equals:
kubernetes.annotations.prometheus.io/scrape: "true"
config:
- module: prometheus
metricsets: ['collector']
enabled: true
period: 30s
hosts: ["${data.kubernetes.pod.ip}:${data.kubernetes.annotations.prometheus.io/port}"]
metrics_path: "${data.kubernetes.annotations.prometheus.io/path}"
use_types: true
rate_counters: false
When this configuration is applied Metricbeat doesn't retrieve Prometheus metrics from pods, I can see in the logs that autodiscovery is enabled but there are no generated events.
I enabled debug logs for kubernetes and autodiscovery
DEBUG [autodiscover] autodiscover/autodiscover.go:94 Configured autodiscover provider: kubernetes
DEBUG [kubernetes] util/kubernetes.go:111 Initializing a new Kubernetes watcher using host: k8s-observability-02-3k4zyczbifig-node-1
INFO [autodiscover] autodiscover/autodiscover.go:117 Starting autodiscover manager
DEBUG [autodiscover] kubernetes/kubernetes.go:351 Starting Leader Elector
7 leaderelection.go:243] attempting to acquire leader lease o11y/metricbeat-cluster-leader...
7 leaderelection.go:253] successfully acquired lease o11y/metricbeat-cluster-leader
DEBUG [autodiscover] kubernetes/kubernetes.go:295 leader election lock GAINED, id beats-leader-k8s-observability-02-3k4zyczbifig-node-1
DEBUG [autodiscover] k8skeystore/kubernetes_keystore.go:76 Cannot retrieve kubernetes namespace from event: map[id:-1650361960156698536 provider:a5a0072e-78e4-498c-a433-a4563424e39c start:%!s(bool=true) unique:true]
DEBUG [kubernetes] util/kubernetes.go:111 Initializing a new Kubernetes watcher using host: k8s-observability-02-3k4zyczbifig-node-1
DEBUG [kubernetes] kubernetes/watcher.go:184 cache sync done
When I comment the line 'unique: true' I can see that it's working correctly:
DEBUG [autodiscover] autodiscover/autodiscover.go:94 Configured autodiscover provider: kubernetes
INFO [autodiscover] autodiscover/autodiscover.go:117 Starting autodiscover manager
DEBUG [autodiscover] autodiscover/autodiscover.go:181 Got a start event. {"autodiscover.event": {"config":[{}],"host":"10.100.1.168","id":"7f9addaa-91b4-4949-a9aa-ef214e1f0e59","kubernetes":{"annotations":{"prometheus":{"io/path":"/metrics","io/period":"1m","io/port":"8080","io/scrape":"true"}},"deployment":{"name":"metric-exporter"},"labels":{"app":"metric-exporter","pod-template-hash":"66b6d8bdfd"},"namespace":"o11y","namespace_uid":"732b5dcd-55d9-4e54-965b-fb551c1a8da3","node":{"hostname":"k8s-observability-02-3k4zyczbifig-node-0","labels":{"apm":"allow","beta_kubernetes_io/arch":"amd64","beta_kubernetes_io/instance-type":"764a1e52-7072-40fc-9c12-9665b98aa8c3","beta_kubernetes_io/os":"linux","elasticData":"data","elasticMaster":"master","failure-domain_beta_kubernetes_io/region":"Lab-Rennes-01","failure-domain_beta_kubernetes_io/zone":"ceph-rns-02","filebeat":"allow","heartbeat":"allow","kibana":"allow","kubernetes_io/arch":"amd64","kubernetes_io/hostname":"k8s-observability-02-3k4zyczbifig-node-0","kubernetes_io/os":"linux","magnum_openstack_org/nodegroup":"default-worker","magnum_openstack_org/role":"worker","metricbeat":"allow","node_kubernetes_io/instance-type":"764a1e52-7072-40fc-9c12-9665b98aa8c3","role":"ingress","topology_cinder_csi_openstack_org/zone":"ceph-rns-02","topology_kubernetes_io/region":"Lab-Rennes-01","topology_kubernetes_io/zone":"ceph-rns-02"},"name":"k8s-observability-02-3k4zyczbifig-node-0","uid":"9b407d46-e695-4990-af28-6d4da6a0b6d1"},"pod":{"ip":"10.100.1.168","name":"metric-exporter-66b6d8bdfd-l45vx","uid":"7f9addaa-91b4-4949-a9aa-ef214e1f0e59"},"replicaset":{"name":"metric-exporter-66b6d8bdfd"}},"meta":{"kubernetes":{"deployment":{"name":"metric-exporter"},"labels":{"app":"metric-exporter","pod-template-hash":"66b6d8bdfd"},"namespace":"o11y","namespace_uid":"732b5dcd-55d9-4e54-965b-fb551c1a8da3","node":{"hostname":"k8s-observability-02-3k4zyczbifig-node-0","labels":{"apm":"allow","beta_kubernetes_io/arch":"amd64","beta_kubernetes_io/instance-type":"764a1e52-7072-40fc-9c12-9665b98aa8c3","beta_kubernetes_io/os":"linux","elasticData":"data","elasticMaster":"master","failure-domain_beta_kubernetes_io/region":"Lab-Rennes-01","failure-domain_beta_kubernetes_io/zone":"ceph-rns-02","filebeat":"allow","heartbeat":"allow","kibana":"allow","kubernetes_io/arch":"amd64","kubernetes_io/hostname":"k8s-observability-02-3k4zyczbifig-node-0","kubernetes_io/os":"linux","magnum_openstack_org/nodegroup":"default-worker","magnum_openstack_org/role":"worker","metricbeat":"allow","node_kubernetes_io/instance-type":"764a1e52-7072-40fc-9c12-9665b98aa8c3","role":"ingress","topology_cinder_csi_openstack_org/zone":"ceph-rns-02","topology_kubernetes_io/region":"Lab-Rennes-01","topology_kubernetes_io/zone":"ceph-rns-02"},"name":"k8s-observability-02-3k4zyczbifig-node-0","uid":"9b407d46-e695-4990-af28-6d4da6a0b6d1"},"pod":{"ip":"10.100.1.168","name":"metric-exporter-66b6d8bdfd-l45vx","uid":"7f9addaa-91b4-4949-a9aa-ef214e1f0e59"},"replicaset":{"name":"metric-exporter-66b6d8bdfd"}}},"provider":"6085519b-29c8-4919-b40e-4133c5474340","start":true}}
DEBUG [autodiscover] autodiscover/autodiscover.go:202 Generated config: {
"enabled": true,
"hosts": [
"xxxxx"
],
"metrics_path": "/metrics",
"metricsets": [
"collector"
],
"module": "prometheus",
"period": "30s",
"rate_counters": false,
"use_types": true
}
DEBUG [autodiscover] autodiscover/autodiscover.go:289 Got a meta field in the event
DEBUG [autodiscover] cfgfile/list.go:63 Starting reload procedure, current runners: 0
DEBUG [autodiscover] cfgfile/list.go:81 Start list: 1, Stop list: 0
DEBUG [autodiscover] cfgfile/list.go:105 Starting runner: RunnerGroup{prometheus [metricsets=1]}
I'm I missing something to correctly apply the unique autodiscovery parameter ?
Or if I can't continue with this solution is there another way to force daemonset metricbeats to start prometheus collection at exactly the same time ?