Metricbeat DNS error in kubernetes Cluster

I have installed metric beat using helm chart and I am getting following DNS error. 2021-02-04T21:27:38.074Z WARN [transport] transport/tcp.go:52 DNS lookup failure "k8s-rke-cluster1-node2": lookup k8s-rke-cluster1-node2 on 10.43.0.10:53: server misbehaving

16:27:38.075

2021-02-04T21:27:38.074Z INFO module/wrapper.go:259 Error fetching data for metricset kubernetes.system: error doing HTTP request to fetch 'system' Metricset data: error making http request: Get "https://k8s-rke-cluster1-node2:10250/stats/summary": lookup k8s-rke-cluster1-node2 on 10.43.0.10:53: server misbehaving

I have the following entry in the metric-beats.yml, but still it doesn't help:
spec:
template:
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet

Can you please help.

Thanks
Umesh

When I looked at DNS, logs, I am getting NXDOMAIN error. Is there some configuration I need to update to
[INFO] 10.42.1.161:44476 - 51541 "A IN k8s-rke-cluster1-node1.logging.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd 155 0.000165481s
[INFO] 10.42.1.161:59219 - 6274 "A IN k8s-rke-cluster1-node1.logging. udp 48 false 512" NXDOMAIN qr,rd,ra 123 0.019766515s
[INFO] 10.42.1.161:59138 - 1373 "A IN wire-read.svc.cluster.local. udp 45 false 512" NXDOMAIN qr,aa,rd 138 0.000230953s

Hi!

Can you share your configuration part that sets up the kubelet api endpoint?

Also could you exec inside metricbeat Pod and try to reach the kubelet api manually with curl?

token=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
curl -H "Authorization: Bearer $token" https://${HOSTNAME}:10250/stats/summary --insecure

Note: The above curl is the equivalent that Metricbeat will try to perform so as to get the metrcis from kubelet.

C.

Hi Chris,
Please find the details you requested for:

      metricbeatConfig:
        metricbeat.yml: |
          metricbeat.modules:
          - module: kubernetes
            metricsets:
              - container
              - node
              - pod
              - system
              - volume
            period: 10s
            host: "${NODE_NAME}"
            hosts: ["https://${NODE_NAME}:10250"]
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            ssl.verification_mode: "none"
            # If using Red Hat OpenShift remove ssl.verification_mode entry and
            # uncomment these settings:
            #ssl.certificate_authorities:
              #- /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
            processors:
            - add_kubernetes_metadata: ~

Also, I was able to execute the curl command and reach the kubelet api without any issues.

> umesh@umeshs-MacBook-Pro Local-ELK % kubectl exec -it my-metricbeat-metricbeat-4tht9 -- /bin/bash
> [root@k8s-rke-cluster1-node2 metricbeat]#
> [root@k8s-rke-cluster1-node2 metricbeat]#
> [root@k8s-rke-cluster1-node2 metricbeat]#
> [root@k8s-rke-cluster1-node2 metricbeat]# token=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
> [root@k8s-rke-cluster1-node2 metricbeat]# curl -H "Authorization: Bearer $token" https://${HOSTNAME}:10250/stats/summary --insecure
> {
>  "node": {
>   "nodeName": "k8s-rke-cluster1-node2",
>   "systemContainers": [
>    {
>     "name": "kubelet",
>     "startTime": "2021-02-08T03:14:19Z",
>     "cpu": {
>      "time": "2021-02-08T16:48:42Z",
>      "usageNanoCores": 32691702,
>      "usageCoreNanoSeconds": 1536862822056
>     },
>     "memory": {
>      "time": "2021-02-08T16:48:42Z",
>      "usageBytes": 179445760,
>      "workingSetBytes": 57331712,
>      "rssBytes": 49405952,
>      "pageFaults": 32014689,
>      "majorPageFaults": 81
>     }
>    },
>    {
>     "name": "runtime",
>     "startTime": "2021-02-08T03:14:16Z",
>     "cpu": {
>      "time": "2021-02-08T16:48:38Z",
>      "usageNanoCores": 30031690,
>      "usageCoreNanoSeconds": 1571225630130
>     },
>     "memory": {
>      "time": "2021-02-08T16:48:38Z",
>      "usageBytes": 6586425344,
>      "workingSetBytes": 3134005248,
>      "rssBytes": 87359488,
>      "pageFaults": 70791372,
>      "majorPageFaults": 333
>     }
>    },
>    {
>     "name": "pods",
>     "startTime": "2021-02-08T03:14:18Z",
>     "cpu": {
>      "time": "2021-02-08T16:48:35Z",
>      "usageNanoCores": 142213300,
>      "usageCoreNanoSeconds": 9085480733540
>     },
>     "memory": {
>      "time": "2021-02-08T16:48:35Z",
>      "availableBytes": 13652520960,
>      "usageBytes": 3504705536,
>      "workingSetBytes": 3173511168,
>      "rssBytes": 3094118400,
>      "pageFaults": 0,
>      "majorPageFaults": 0
>     }
>    }
>   ],
>   "startTime": "2021-02-08T03:14:11Z",
>   "cpu": {
>    "time": "2021-02-08T16:48:35Z",
>    "usageNanoCores": 241303905,
>    "usageCoreNanoSeconds": 13295512173772
>   },
>   "memory": {
>    "time": "2021-02-08T16:48:35Z",
>    "availableBytes": 10005635072,
>    "usageBytes": 11078508544,
>    "workingSetBytes": 6820397056,
>    "rssBytes": 3609812992,
>    "pageFaults": 444927,
>    "majorPageFaults": 84
>   },

Could you try with curl -H "Authorization: Bearer $token" https://${NODE_NAME}:10250/stats/summary --insecure ?
NODE_NAME instead of HOSTNAME.
Also could you please share the value of these env vars?

Hi Chris,
Please find the details you requested for. Looks like if the HOSTNAME has domain name whereas NODE_NAME doesn't. My resolv.conf does have the domain-name though.
Thanks
Umesh

[root@k8s-rke-cluster1-node2 metricbeat]# curl -H "Authorization: Bearer $token" https://${NODE_NAME}:10250/stats/summary --insecure
curl: (6) Could not resolve host: k8s-rke-cluster1-node2; Unknown error
[root@k8s-rke-cluster1-node2 metricbeat]# echo ${HOSTNAME}
k8s-rke-cluster1-node2.pfg.dom
[root@k8s-rke-cluster1-node2 metricbeat]# echo ${NODE_NAME}
k8s-rke-cluster1-node2
[root@k8s-rke-cluster1-node2 metricbeat]# cat /etc/resolv.conf
nameserver 10.43.0.10
search logging.svc.cluster.local svc.cluster.local cluster.local pfg.dom
options ndots:5

Hey again.

So NODE_NAME should be the name of the node as it is assigned on Pod's spec by k8s. If you run kubectl get nodes you should see k8s-rke-cluster1-node2. However, I'm not sure why it is different from the HOSTNAME and how HOSTNAME is set in your case, do you have any special DNS configuration that may result in this?

In any case, in order to tackle this just change hosts: ["https://${NODE_NAME}:10250"] in your configuration to hosts: ["https://${HOSTNAME}:10250"]. This should do the trick.

Hi Chris,
I don't have any special config DNS config. We use rancher , so I am not sure whether rancher has any special config.

Thanks & Regards,
Umesh

Hey!

So did replacing with HOSTNAME work?