Error running Metricbeat 7.6 on AKS K8S cluster

I'm trying to run metricbeat 7.6.2 on my AKS cluster (k8s version 1.16),
And getting the following errors:

2020-08-27T11:00:03.417Z	INFO	module/wrapper.go:252	Error fetching data for metricset kubernetes.volume: error doing HTTP request to fetch 'volume' Metricset data: error making http request: Get https://aks-agentpool-XXX:10250/stats/summary: x509: certificate signed by unknown authority
2020-08-27T11:00:03.494Z	INFO	module/wrapper.go:252	Error fetching data for metricset kubernetes.node: error doing HTTP request to fetch 'node' Metricset data: error making http request: Get https://aks-agentpool-XXX:10250/stats/summary: x509: certificate signed by unknown authority
2020-08-27T11:00:03.548Z	INFO	module/wrapper.go:252	Error fetching data for metricset kubernetes.system: error doing HTTP request to fetch 'system' Metricset data: error making http request: Get https://aks-agentpool-XXX:10250/stats/summary: x509: certificate signed by unknown authority
2020-08-27T11:00:03.572Z	INFO	module/wrapper.go:252	Error fetching data for metricset kubernetes.pod: error doing HTTP request to fetch 'pod' Metricset data: error making http request: Get https://aks-agentpool-XXX:10250/stats/summary: x509: certificate signed by unknown authority
2020-08-27T11:00:03.644Z	INFO	module/wrapper.go:252	Error fetching data for metricset kubernetes.container: error doing HTTP request to fetch 'container' Metricset data: error making http request: Get https://aks-agentpool-XXX:10250/stats/summary: x509: certificate signed by unknown authority

My metricbeat-daemonset-config is:

kubernetes.yml: |-
    - module: kubernetes
      metricsets:
        - node
        - system
        - pod
        - container
        - volume
      period: 10s
      host: ${NODE_NAME}
      hosts: ["https://${HOSTNAME}:10250"]
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      ssl.certificate_authorities:
        - '/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'

Some notes:

  • Latest Metricbeat version that's working for me is 7.4.
  • Metricbeat 7.6 is ok on EKS with this config.

Any suggestions?

Hi!

The error indicates that your certificates are not capable to be used in order to reach Kubelet's API.
Do you actually want to enable ssl?

In general you can bypass this using ssl.verification_mode: "none", otherwise you need to make sure that you are using the proper certificate to access the API.

See https://github.com/elastic/beats/blob/f2956098ef62b0fec1ae02e7cb9659dd6b9e6fe9/deploy/kubernetes/metricbeat-kubernetes.yaml#L116

C.

@ChrsMark
Yeah I did notice that, but I saw in the documentation that it is not recommended to set this field to "none".
Is the ssl.verification_mode field refers to the pods certificate (the one that's in /var/run/secrets/kubernetes.io/serviceaccount/ca.crt, or to the certificate that I'm using in the output.ssl.certificate_authorities one?

It refers to ssl.certificate_authorities that is used to access Kubelet's API (/var/run/secrets/kubernetes.io/serviceaccount/ca.crt). It is also in the same config section: https://github.com/elastic/beats/blob/f2956098ef62b0fec1ae02e7cb9659dd6b9e6fe9/deploy/kubernetes/metricbeat-kubernetes.yaml#L119

@ChrsMark Thanks for the quick response. Just to make me understand it better- do you know what is the change from 7.4 to 7.5 that makes those errors when ssl.verification_mode is not none?

The initial error you see has to do with an invalid certificate. In this, I'm not sure what could be the reason. It could be the certificate itself, or Kubelet's config that changes some config option.

On our side, from 7.6.2 we switched to accessing secure port of Kubelet API. PR: https://github.com/elastic/beats/pull/16063