Metricbeat 7.5.1 Kubernetes Daemonset 400 Bad Request

Hello, new user here. I recently deployed ECK on my own server with kubeadm (currently I'm just testing, so I have only one master node where I also schedule pods), using the ECK operator pattern and the quickstart guide. Then I wanted to monitor my kubernetes cluster using the quickstart elastic/kibana, so I turned to Metricbeat with the kubernetes module. I deployed kube-state-metrics as a pod without problem and then deployed metricbeat on kubernetes following (https://www.elastic.co/guide/en/beats/metricbeat/current/running-on-kubernetes.html). Both the deployment and daemonset pods come up just fine, but the daemonset pod is showing errors in its log like:

2020-01-22T13:43:50.000Z        INFO    module/wrapper.go:252   Error fetching data for metricset kubernetes.volume: error doing HTTP request to fetch 'volume' Metricset data: HTTP error 400 in : 400 Bad Request
2020-01-22T13:43:50.000Z        INFO    module/wrapper.go:252   Error fetching data for metricset kubernetes.system: error doing HTTP request to fetch 'system' Metricset data: HTTP error 400 in : 400 Bad Request
2020-01-22T13:43:50.100Z        INFO    module/wrapper.go:252   Error fetching data for metricset kubernetes.node: error doing HTTP request to fetch 'node' Metricset data: HTTP error 400 in : 400 Bad Request
2020-01-22T13:43:50.100Z        INFO    module/wrapper.go:252   Error fetching data for metricset kubernetes.container: error doing HTTP request to fetch 'container' Metricset data: HTTP error 400 in : 400 Bad Request
2020-01-22T13:43:50.100Z        INFO    module/wrapper.go:252   Error fetching data for metricset kubernetes.pod: error doing HTTP request to fetch 'pod' Metricset data: HTTP error 400 in : 400 Bad Request
2020-01-22T13:43:56.871Z        INFO    [monitoring]    log/log.go:145  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":200,"time":{"ms":102}},"total":{"ticks":1160,"time":{"ms":363},"value":1160},"user":{"ticks":960,"time":{"ms":261}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":12},"info":{"ephemeral_id":"751c1c1e-52f6-4823-b88f-16b1cd5a81a7","uptime":{"ms":90066}},"memstats":{"gc_next":20000144,"memory_alloc":11140640,"memory_total":188857472,"rss":794624},"runtime":{"goroutines":92}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":90,"batches":3,"total":90},"read":{"bytes":21162},"write":{"bytes":120577}},"pipeline":{"clients":4,"events":{"active":0,"published":90,"total":90},"queue":{"acked":90}}},"metricbeat":{"kubernetes":{"container":{"events":3,"failures":3},"node":{"events":3,"failures":3},"pod":{"events":3,"failures":3},"proxy":{"events":9,"success":9},"system":{"events":3,"failures":3},"volume":{"events":3,"failures":3}},"system":{"cpu":{"events":3,"success":3},"load":{"events":3,"success":3},"memory":{"events":3,"success":3},"network":{"events":36,"success":36},"process":{"events":18,"success":18},"process_summary":{"events":3,"success":3}}},"system":{"load":{"1":0.2,"15":0.31,"5":0.25,"norm":{"1":0.0333,"15":0.0517,"5":0.0417}}}}}}

It looks like some requests from the kubernetes module are getting HTTP 400 errors when querying the kubelet service, but I haven't been able to find why. My Metricbeat daemonset config contains the following:

kubernetes.yml: |-
  - module: kubernetes
    metricsets:
      - node
      - system
      - pod
      - container
      - volume
    period: 10s
    host: ${NODE_NAME}
    hosts: ["localhost:10250"]
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    ssl:
      # disable verification since kubelet uses self signed certificate
      verification_mode: none
      certificate_authorities:
      - /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  - module: kubernetes
    metricsets:
      - proxy
    period: 10s
    host: ${NODE_NAME}
    hosts: ["localhost:10249"]

I'm using Metricbeat 7.5.1 (Image: docker.elastic.co/beats/metricbeat:7.5.1), Kubernetes & Kubelet v1.17.1.

Things I tried and didn't work:

  • Giving the Metricbeat pod access to the API server kubelet client certificates and key to authenticate (instead of service account token)
  • Using $NODE_NAME instead of localhost
  • Passing the debug parameter (-d "*") to metricbeat (I hoped this would let me see the requests that were failing with status 400, but it didn't)

Why am I seeing these errors? Am I missing something?
Thanks in advance.

Hi @diegomolina :slight_smile:

Everything looks correct. I think that a good first step is to isolate the metricset that might be causing this. This way, we can see if it's something related with the central code or not.

If I'm understanding the logs correctly, all the kubernetes module metricsets except the proxy metricset are resulting in error (node, system, pod, container and volume). Is there a way I can replicate the requests done by metricbeat to the kubelet with curl or something similar to try and understand what might be wrong? I don't know the structure of these requests, are these documented somewhere?

EDIT: I went into the metricbeat pod and tried curl -k -H "Authorization: Bearer $TOKEN" -i https://localhost:10250/stats/summary, which produced normal results like these:

HTTP/2 200
content-type: application/json
date: Wed, 22 Jan 2020 19:43:11 GMT

{
 "node": {
  "nodeName": "myserver",
  "systemContainers": [
   {
    "name": "kubelet",
    "startTime": "2020-01-16T14:08:15Z",
    "cpu": {
     "time": "2020-01-22T19:43:04Z",
     "usageNanoCores": 54390010,
     "usageCoreNanoSeconds": 24288018626709
    },
    "memory": {
     "time": "2020-01-22T19:43:04Z",
     "usageBytes": 92622848,
     "workingSetBytes": 72990720,
     "rssBytes": 59215872,
     "pageFaults": 227163501,
     "majorPageFaults": 22
    }
   },
   {
    "name": "runtime",
    "startTime": "2020-01-21T16:26:33Z",
    ...

What should I do to try isolating the faulty metricset?

To isolate just run Metricbeat with a single metricset, this way we can see if all fail (central issue) or just one of them.

Requests to K8s are plain. Nothing like a long query with a ton of parameters so a normal curl to kubelet or kubeadmin should be enough to check.

On a second pass on your config, I was wondering if the error could be caused by the proxy metricset configuration which doesn't includes the ssl part.

Maybe you can also prefix you accesses to localhost with https

Wow! Prefixing localhost with https worked. I'd say that's a bit weird since the configuration for output.elasticsearch needs a protocol parameter set to https and adding the https:// to the string in hosts does nothing:

...
output.elasticsearch:
      protocol: https
      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
...

Anyway, it's working now. Thanks for your help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.