Metricbeat autodiscovery: use 'unique' with Prometheus exporters

Metricbeat 7.16.2

Hi,

I'm trying to scrape prometheus metrics of multiple pods on different nodes from the same metricbeat pod.

Actually metricbeat is deployed as a daemonset and prometheus collection is done on node scope, each metricbeat collects the metrics of the pods in the same node.

Used annotations:

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8080"
  prometheus.io/path: "/metrics"

Actual configuration:

  config:
    metricbeat:
      autodiscover:
        providers:
          - type: kubernetes
            node: ${NODE_NAME}
            templates:
              - condition.equals:
                    kubernetes.annotations.prometheus.io/scrape: "true"
                config:
                  - module: prometheus
                    metricsets: ['collector']
                    enabled: true
                    period: 30s
                    hosts: ["${data.kubernetes.pod.ip}:${data.kubernetes.annotations.prometheus.io/port}"]
                    metrics_path: "${data.kubernetes.annotations.prometheus.io/path}"
                    use_types: true
                    rate_counters: false

The issue here is that every single metricbeat will start prometheus module collection at a different timestamp from the others (The collection period is the same: 30s) which complicates dashboards creation on Kibana

My goal is to remove the timestamp gap in the events generated from metricbeat by collecting prometheus metrics only from one pod.
To do so I tried the options 'unique: true' and 'scope: cluster' in the autodiscovery settings:

  config:
    metricbeat:
      autodiscover:
        providers:
          - type: kubernetes
            node: ${NODE_NAME}
            scope: cluster
            unique: true
            templates:
              - condition.equals:
                    kubernetes.annotations.prometheus.io/scrape: "true"
                config:
                  - module: prometheus
                    metricsets: ['collector']
                    enabled: true
                    period: 30s
                    hosts: ["${data.kubernetes.pod.ip}:${data.kubernetes.annotations.prometheus.io/port}"]
                    metrics_path: "${data.kubernetes.annotations.prometheus.io/path}"
                    use_types: true
                    rate_counters: false

When this configuration is applied Metricbeat doesn't retrieve Prometheus metrics from pods, I can see in the logs that autodiscovery is enabled but there are no generated events.
I enabled debug logs for kubernetes and autodiscovery

    DEBUG   [autodiscover]  autodiscover/autodiscover.go:94 Configured autodiscover provider: kubernetes
    DEBUG   [kubernetes]    util/kubernetes.go:111  Initializing a new Kubernetes watcher using host: k8s-observability-02-3k4zyczbifig-node-1
    INFO    [autodiscover]  autodiscover/autodiscover.go:117        Starting autodiscover manager
    DEBUG   [autodiscover]  kubernetes/kubernetes.go:351    Starting Leader Elector
7 leaderelection.go:243] attempting to acquire leader lease o11y/metricbeat-cluster-leader...
7 leaderelection.go:253] successfully acquired lease o11y/metricbeat-cluster-leader
    DEBUG   [autodiscover]  kubernetes/kubernetes.go:295    leader election lock GAINED, id beats-leader-k8s-observability-02-3k4zyczbifig-node-1
    DEBUG   [autodiscover]  k8skeystore/kubernetes_keystore.go:76   Cannot retrieve kubernetes namespace from event: map[id:-1650361960156698536 provider:a5a0072e-78e4-498c-a433-a4563424e39c start:%!s(bool=true) unique:true]
    DEBUG   [kubernetes]    util/kubernetes.go:111  Initializing a new Kubernetes watcher using host: k8s-observability-02-3k4zyczbifig-node-1
    DEBUG   [kubernetes]    kubernetes/watcher.go:184       cache sync done

When I comment the line 'unique: true' I can see that it's working correctly:

DEBUG   [autodiscover]  autodiscover/autodiscover.go:94 Configured autodiscover provider: kubernetes
INFO    [autodiscover]  autodiscover/autodiscover.go:117        Starting autodiscover manager
DEBUG   [autodiscover]  autodiscover/autodiscover.go:181        Got a start event.      {"autodiscover.event": {"config":[{}],"host":"10.100.1.168","id":"7f9addaa-91b4-4949-a9aa-ef214e1f0e59","kubernetes":{"annotations":{"prometheus":{"io/path":"/metrics","io/period":"1m","io/port":"8080","io/scrape":"true"}},"deployment":{"name":"metric-exporter"},"labels":{"app":"metric-exporter","pod-template-hash":"66b6d8bdfd"},"namespace":"o11y","namespace_uid":"732b5dcd-55d9-4e54-965b-fb551c1a8da3","node":{"hostname":"k8s-observability-02-3k4zyczbifig-node-0","labels":{"apm":"allow","beta_kubernetes_io/arch":"amd64","beta_kubernetes_io/instance-type":"764a1e52-7072-40fc-9c12-9665b98aa8c3","beta_kubernetes_io/os":"linux","elasticData":"data","elasticMaster":"master","failure-domain_beta_kubernetes_io/region":"Lab-Rennes-01","failure-domain_beta_kubernetes_io/zone":"ceph-rns-02","filebeat":"allow","heartbeat":"allow","kibana":"allow","kubernetes_io/arch":"amd64","kubernetes_io/hostname":"k8s-observability-02-3k4zyczbifig-node-0","kubernetes_io/os":"linux","magnum_openstack_org/nodegroup":"default-worker","magnum_openstack_org/role":"worker","metricbeat":"allow","node_kubernetes_io/instance-type":"764a1e52-7072-40fc-9c12-9665b98aa8c3","role":"ingress","topology_cinder_csi_openstack_org/zone":"ceph-rns-02","topology_kubernetes_io/region":"Lab-Rennes-01","topology_kubernetes_io/zone":"ceph-rns-02"},"name":"k8s-observability-02-3k4zyczbifig-node-0","uid":"9b407d46-e695-4990-af28-6d4da6a0b6d1"},"pod":{"ip":"10.100.1.168","name":"metric-exporter-66b6d8bdfd-l45vx","uid":"7f9addaa-91b4-4949-a9aa-ef214e1f0e59"},"replicaset":{"name":"metric-exporter-66b6d8bdfd"}},"meta":{"kubernetes":{"deployment":{"name":"metric-exporter"},"labels":{"app":"metric-exporter","pod-template-hash":"66b6d8bdfd"},"namespace":"o11y","namespace_uid":"732b5dcd-55d9-4e54-965b-fb551c1a8da3","node":{"hostname":"k8s-observability-02-3k4zyczbifig-node-0","labels":{"apm":"allow","beta_kubernetes_io/arch":"amd64","beta_kubernetes_io/instance-type":"764a1e52-7072-40fc-9c12-9665b98aa8c3","beta_kubernetes_io/os":"linux","elasticData":"data","elasticMaster":"master","failure-domain_beta_kubernetes_io/region":"Lab-Rennes-01","failure-domain_beta_kubernetes_io/zone":"ceph-rns-02","filebeat":"allow","heartbeat":"allow","kibana":"allow","kubernetes_io/arch":"amd64","kubernetes_io/hostname":"k8s-observability-02-3k4zyczbifig-node-0","kubernetes_io/os":"linux","magnum_openstack_org/nodegroup":"default-worker","magnum_openstack_org/role":"worker","metricbeat":"allow","node_kubernetes_io/instance-type":"764a1e52-7072-40fc-9c12-9665b98aa8c3","role":"ingress","topology_cinder_csi_openstack_org/zone":"ceph-rns-02","topology_kubernetes_io/region":"Lab-Rennes-01","topology_kubernetes_io/zone":"ceph-rns-02"},"name":"k8s-observability-02-3k4zyczbifig-node-0","uid":"9b407d46-e695-4990-af28-6d4da6a0b6d1"},"pod":{"ip":"10.100.1.168","name":"metric-exporter-66b6d8bdfd-l45vx","uid":"7f9addaa-91b4-4949-a9aa-ef214e1f0e59"},"replicaset":{"name":"metric-exporter-66b6d8bdfd"}}},"provider":"6085519b-29c8-4919-b40e-4133c5474340","start":true}}
DEBUG   [autodiscover]  autodiscover/autodiscover.go:202        Generated config: {
  "enabled": true,
  "hosts": [
    "xxxxx"
  ],
  "metrics_path": "/metrics",
  "metricsets": [
    "collector"
  ],
  "module": "prometheus",
  "period": "30s",
  "rate_counters": false,
  "use_types": true
}
DEBUG   [autodiscover]  autodiscover/autodiscover.go:289        Got a meta field in the event
DEBUG   [autodiscover]  cfgfile/list.go:63      Starting reload procedure, current runners: 0
DEBUG   [autodiscover]  cfgfile/list.go:81      Start list: 1, Stop list: 0
DEBUG   [autodiscover]  cfgfile/list.go:105     Starting runner: RunnerGroup{prometheus [metricsets=1]}

I'm I missing something to correctly apply the unique autodiscovery parameter ?

Or if I can't continue with this solution is there another way to force daemonset metricbeats to start prometheus collection at exactly the same time ?

@ChrisMark Could you help on this one please :grimacing: TIA!!

Hi @Dali_Ben_amor
Would it be an option for your use case to use prometheus-server-service-name instead of ${data.kubernetes.pod.ip}:

hosts: [prometheus-server-service-name:${data.kubernetes.annotations.prometheus.io/port}"]

similar to the configuration for the kube-state-metrics here ?

I have the impression that you cannot use autodiscover templates along with unique: true option. If unique: true setting is used then only static configs can be used by the leader.
An alternative here would be to deploy Metricbeat as a Deployment too along with the Daemonset, which Deployment will be responsible for doing the discovery on cluster level.

1 Like

Thank you @ChrsMark , this is the information that I was looking for and couldn't find in the docs.

And this is the solution that I am actually testing to do it, along with the options

queue.mem:
  events: 4096
  flush.min_events: 512
  flush.timeout: 5s

I'm trying to regroup the events of each collection period in the same timestamp, but it looks like that the timestamp field is always filled with the ingestion time.
The result is some seconds of difference

Do you think it is possible to overwrite @timestamp filed to have the time of when the event was sent to Elastic by Metricbeat instead of the ingestion time by Metricbeat ?

Or I can use processors to add a new field, but how can I get the Elastic ingestion timestamp ?

Hi @Tetiana_Kravchenko
Thank you for your answer !
Correct me if I'm wrong but I think that with this option I will only get the result of one pod each 30s, since the service will do a loadbalancer role between the pods and only one of them will be scrapped.
The goal here is to scrape all the pods with the same timestamp

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.