Dial tcp {kubelet_ip}:10250 i/o timeout while trying to get Metricbeat to talk to Azure Kubernetes Service

Getting these errors when trying to install metricbeat as a daemonset using terraform.

What could be the issue, metricbeat or AKS ?

2022-05-03T20:09:01.492Z ERROR module/wrapper.go:259 Error fetching data for metricset kubernetes.system: error doing HTTP request to fetch 'system' Metricset data: error making http request: Get "{my_aks_url}:10250/stats/summary": dial tcp  {my_aks_IP}:10250: i/o timeout


2022-05-03T20:09:01.492Z ERROR module/wrapper.go:259 Error fetching data for metricset kubernetes.volume: error doing HTTP request to fetch 'volume' Metricset data: error making http request: Get "{my_aks_url}10250/stats/summary": dial tcp {my_aks_IP}:10250: i/o timeout


2022-05-03T20:09:01.612Z ERROR [kubernetes.container] container/container.go:93 error making http request: Get "{my_aks_url}:10250/stats/summary": dial tcp {my_aks_IP}:10250: i/o timeout

Hi @surprised_ferret

Are you using the default configuration hosts: ["https://${NODE_NAME}:10250"] beats/metricbeat-kubernetes.yaml at main · elastic/beats · GitHub for this field?

you can try to exec to the one of the metricbeat pods and try to access kubelet:

kubectl exec -it <metricbeat-pod> bash

token=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
curl -k -H "Authorization: Bearer $(echo $token)" "https://${NODE_NAME}:10250/stats/summary"

does it fail with the same error?

I am passing in this yml to terraform:

daemonset:
  metricbeatConfig:
    metricbeat.yml: |
      metricbeat.modules:
      - module: kubernetes
        metricsets:
          - container
          - node
          - pod
          - system
          - volume
        period: 10s
        host: "${NODE_NAME}" # passed in as azurerm_kubernetes_cluster.global.fqdn
        hosts: ["${NODE_NAME}:10250"]
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        ssl.verification_mode: "none"
        processors:
        - add_kubernetes_metadata: ~
      - module: kubernetes
        enabled: true
        metricsets:
          - event
      - module: system
        period: 10s
        metricsets:
          - cpu
          - load
          - memory
          - network
          - process
          - process_summary
        processes: ['.*']
        process.include_top_n:
          by_cpu: 5
          by_memory: 5
      - module: system
        period: 1m
        metricsets:
          - filesystem
          - fsstat
        processors:
        - drop_event.when.regexp:
            system.filesystem.mount_point: '^/(sys|cgroup|proc|dev|etc|host|lib)($|/)'
      output.elasticsearch:
        protocol: https
        hosts: '${ELASTICSEARCH_HOSTS}' # passed in as ec_deployment.global.elasticsearch.0.https_endpoint
        username: '${ELASTICSEARCH_USERNAME}' # passed in as elasticstack_elasticsearch_security_user.metricbeat_user.username
        password: '${ELASTICSEARCH_PASSWORD}' # passed in as elasticstack_elasticsearch_security_user.metricbeat_user.password
logging.metrics.enabled: false

I tried to log into a pod within the metricbeat daemonset, but it says 'not found'.

I also tried to log into a pod within the metricbeat-metricbeat-metrics deployment, but same issue.

Actually hold on, I needed to pass in namespace :upside_down_face: hold please...

Getting timeout here as well @Tetiana_Kravchenko

root@metricbeat-metricbeat-78pt9:/usr/share/metricbeat# curl -k -H "Authorization: Bearer $(echo $token)" "https://{my_node_name}.hcp.centralus.azmk8s.io:10250/stats/summary"

curl: (28) Failed to connect to {my_node_name}.hcp.centralus.azmk8s.io port 10250: Connection timed out

I think it might be related to the AKS internals - here is mentioned similar issue in AKS documentation