Metricset:state_container slow to ingest and missing configured 10s sample intervals

Observing slow sample rate for module:kubernetes metricset: state_container on cluster with +-2000 containers. Documents are ingested roughly every 13 seconds.

- Versions:
metricbeat: 7.12.0 & 6.8.2
kubernetes: 1.19.2
kube-state-metrics: v1.9.8

- Steps to Reproduce:

Dedicate metricbeat deployment to collect only container states at 10s interval

    - module: kubernetes
      metricsets:
        - state_container
      period: 10s
      host: ${NODE_NAME}
      hosts: ["kube-state-metrics:8080"]

Kibana logs shows show consistent sample interval of 12-13 seconds when 10seconds is expected:

Time	                    kubernetes.pod.name	        kubernetes.container.cpu.request.cores
Apr 7, 2021 @ 15:35:05.764	kyverno-598d67f5f9-95r95	0.2
Apr 7, 2021 @ 15:34:52.727	kyverno-598d67f5f9-95r95	0.2
Apr 7, 2021 @ 15:34:39.529	kyverno-598d67f5f9-95r95	0.2
Apr 7, 2021 @ 15:34:26.411	kyverno-598d67f5f9-95r95	0.2
Apr 7, 2021 @ 15:34:13.165	kyverno-598d67f5f9-95r95	0.2
Apr 7, 2021 @ 15:34:00.033	kyverno-598d67f5f9-95r95	0.2
Apr 7, 2021 @ 15:33:47.448	kyverno-598d67f5f9-95r95	0.2
Apr 7, 2021 @ 15:33:34.743	kyverno-598d67f5f9-95r95	0.2
Apr 7, 2021 @ 15:33:21.628	kyverno-598d67f5f9-95r95	0.2
Apr 7, 2021 @ 15:33:08.755	kyverno-598d67f5f9-95r95	0.2
Apr 7, 2021 @ 15:32:55.184	kyverno-598d67f5f9-95r95	0.2

- Observations:
Fluctuation in number of kube-state-metrics container states collected (4000 / 6000)
metricbeat log output:

2021-04-07T22:16:16.152Z	INFO	[monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":4452,"success":4452}}}
2021-04-07T22:16:46.152Z	INFO	[monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":4452,"success":4452}}}
2021-04-07T22:17:16.152Z	INFO	[monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":6680,"success":6680}}}
2021-04-07T22:17:46.152Z	INFO	[monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":4456,"success":4456}}}
2021-04-07T22:18:16.151Z	INFO	[monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":4455,"success":4455}}}
2021-04-07T22:18:46.153Z	INFO	[monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":4455,"success":4455}}}
2021-04-07T22:19:16.152Z	INFO	[monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":6686,"success":6686}}}
2021-04-07T22:19:46.151Z	INFO	[monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":4454,"success":4454}}}
2021-04-07T22:20:16.152Z	INFO	[monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":4452,"success":4452}}}
2021-04-07T22:20:46.151Z	INFO	[monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":4458,"success":4458}}}

Metric collection time from kube-state-metric for +-300k metrics is around 2seconds:

$ date && time curl -s http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics | wc
Wed Apr  7 22:09:03 UTC 2021
   294381    591129  46625296
real	0m 1.96s
user	0m 0.01s
sys	0m 0.15s

$ date && time curl -s http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics | wc
Wed Apr  7 22:09:09 UTC 2021
   294381    591129  46625296
real	0m 1.97s
user	0m 0.05s
sys	0m 0.12s

$ date && time curl -s http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics | wc
Wed Apr  7 22:09:15 UTC 2021
   294382    591131  46625426
real	0m 2.05s
user	0m 0.02s
sys	0m 0.14s

$ date && time curl -s http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics | wc
Wed Apr  7 22:09:22 UTC 2021
   294382    591131  46625458
real	0m 2.07s
user	0m 0.03s
sys	0m 0.13s

$ date && time curl -s http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics | wc
Wed Apr  7 22:09:30 UTC 2021
   294382    591131  46625458
real	0m 2.00s
user	0m 0.02s
sys	0m 0.13s

metricbeat deployment is not bounded by memory or cpu:

      name: metricbeat
      resources:
        limits:
          memory: 2Gi
        requests:
          cpu: 100m
          memory: 100Mi
$ k top pod -n kube-system metricbeat-756d654796-ws4dc
NAME                          CPU(cores)   MEMORY(bytes)   
metricbeat-756d654796-ws4dc   1285m        400Mi 

All other state_* metricsets ingest without issue every 10seconds.

Questions:

  • Is there a known upper limit to the number containers that can be monitored in a 10second interval?
  • Why would the number of metricbeat container_state events fluctuate between 4000-6000?
  • How can the container_state process be scaled to handle +2000 containers?

Hey @MnrGreg,

What you describe is more or less expected.

The period configured in Metricbeat modules can be seen as a minimum period between data collections. If each data collection takes less that the duration configured in period, then an event will be collected for each period, but if data collection takes more than the period, next collection won't be triggered till the current one finishes.

In your case it may be happening that the data collection of the state_container metricset takes about 12-13 seconds, so the events are collected with this frequency. Probably if you increase the period to let's say 20 seconds, you will see events every 20 seconds.

Apart of making this request, Metricbeat needs to process it and generate all the events for each one of these 2000 containers.

It is possible that state_container takes more time than other state_* metricsets because it has to generate many more events.

It depends on the case. It is going to depend on the size of the response from kube-state-metrics, the available bandwidth and the available CPU. But it is very possible that in scenarios with many containers, 10 seconds is not enough to do a collection.

This would mean that Metricbeat is generating between 4000 and 6000 events every time these metrics are collected. These metrics are usually generated every 30 seconds. If in this case Metricbeat is collecting events for around 2000 containers every 12-13 seconds, it can be that it is collecting between 4000 and 6000 events every 30 seconds. Usually 2 collections happen in these 30 seconds (then the 4K values), and sometimes 3 collections happen (and then there are 6K values).

Well, it seems to be currently able to handle 2000 containers, but not in the configured period. If you want to have a more consistent ratio of events collected, you should increase the period. Something else you can try is to request more cpu for Metricbeat, so it can generate the events faster.
Also, you can try to have different Metricbeat instances for different state_* metricsets. For example you could have a dedicated instance for state_container, and other one for the rest of the state_* metricsets if they are being faster to process.

For bigger cluster sizes you may also need to increase the timeout, that may affect the requests done. It defaults to 10 seconds and is applied to requests done by Metricbeat. In this case it would mean that requests to kube-state-metrics that take more than 10 seconds would be interrupted with the default configuration.

There are also a couple of open issues that should improve the efficiency of these metricsets, and help reducing their collection times:

Hi @jsoriano

This particular metricbeat deployment is already dedicated to just collecting state_container. What's curious is that the other metricbeat deployment is configured with the remaining state_* metricsets [deployment, pods, node, events, etc] and manages to sample every 10s without issue.

The state_container metricbeat consumes 1.3 cores consistently. No CPU Limit is set. The CPU Request has since been increased to 1 core but not too sure how this will help as the nodes are not under pressure.

If there is no processing optimization that can be done within state_container then what are your thoughts on following:

  1. Interim - run a second kube-state-metrics which is configured to only collect container related metrics. Point the dedicated state_container metricbeat deployment to this new kube-state-metrics instance via a different service name. Should reduce the amount of metrics collected and reduce the sorting time.

  2. Long term - log a feature request to have metricbeat state_* deployment made shard aware. One might then be able to simply increase the metricbeat deployment replicas inline with the number of kube-state-metrics shards. The single call per endpoint #24869 would simply this further by moving back to a single deployment for all state_* metricsets.

Yeah, if the nodes are not under pressure and Metricbeat is constantly taking all the CPU it needs you are right, I also don't think this will help.

This may help, but up to a certain point. As you mention, getting the response from kube-state-metrics takes about 2 seconds, the other ~10 seconds Metricbeat is processing and sending the events. I would expect these 10 seconds to be similar with a response that only includes containers.

Yes, sharding may be a good idea. This actually reminds me two things.

There was already an attempt of introducing sharding in autodiscover for its use with other kubernetes modules, but it was for the "non-state" metricsets, and for autodiscover. In principle this wouldn't help here. You can find the PR here: https://github.com/elastic/beats/pull/23257

kube-state-metrics supports sharding. Maybe an approach could be to use sharding there, and have different Metricbeat instances querying each one of the shards. I have never tried this, but it might work.