Observing slow sample rate for module:kubernetes metricset: state_container on cluster with +-2000 containers. Documents are ingested roughly every 13 seconds.
- Versions:
metricbeat: 7.12.0 & 6.8.2
kubernetes: 1.19.2
kube-state-metrics: v1.9.8
- Steps to Reproduce:
Dedicate metricbeat deployment to collect only container states at 10s interval
- module: kubernetes
metricsets:
- state_container
period: 10s
host: ${NODE_NAME}
hosts: ["kube-state-metrics:8080"]
Kibana logs shows show consistent sample interval of 12-13 seconds when 10seconds is expected:
Time kubernetes.pod.name kubernetes.container.cpu.request.cores
Apr 7, 2021 @ 15:35:05.764 kyverno-598d67f5f9-95r95 0.2
Apr 7, 2021 @ 15:34:52.727 kyverno-598d67f5f9-95r95 0.2
Apr 7, 2021 @ 15:34:39.529 kyverno-598d67f5f9-95r95 0.2
Apr 7, 2021 @ 15:34:26.411 kyverno-598d67f5f9-95r95 0.2
Apr 7, 2021 @ 15:34:13.165 kyverno-598d67f5f9-95r95 0.2
Apr 7, 2021 @ 15:34:00.033 kyverno-598d67f5f9-95r95 0.2
Apr 7, 2021 @ 15:33:47.448 kyverno-598d67f5f9-95r95 0.2
Apr 7, 2021 @ 15:33:34.743 kyverno-598d67f5f9-95r95 0.2
Apr 7, 2021 @ 15:33:21.628 kyverno-598d67f5f9-95r95 0.2
Apr 7, 2021 @ 15:33:08.755 kyverno-598d67f5f9-95r95 0.2
Apr 7, 2021 @ 15:32:55.184 kyverno-598d67f5f9-95r95 0.2
- Observations:
Fluctuation in number of kube-state-metrics container states collected (4000 / 6000)
metricbeat log output:
2021-04-07T22:16:16.152Z INFO [monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":4452,"success":4452}}}
2021-04-07T22:16:46.152Z INFO [monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":4452,"success":4452}}}
2021-04-07T22:17:16.152Z INFO [monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":6680,"success":6680}}}
2021-04-07T22:17:46.152Z INFO [monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":4456,"success":4456}}}
2021-04-07T22:18:16.151Z INFO [monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":4455,"success":4455}}}
2021-04-07T22:18:46.153Z INFO [monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":4455,"success":4455}}}
2021-04-07T22:19:16.152Z INFO [monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":6686,"success":6686}}}
2021-04-07T22:19:46.151Z INFO [monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":4454,"success":4454}}}
2021-04-07T22:20:16.152Z INFO [monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":4452,"success":4452}}}
2021-04-07T22:20:46.151Z INFO [monitoring] ... metricbeat:{"kubernetes":{"state_container":{"events":4458,"success":4458}}}
Metric collection time from kube-state-metric for +-300k metrics is around 2seconds:
$ date && time curl -s http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics | wc
Wed Apr 7 22:09:03 UTC 2021
294381 591129 46625296
real 0m 1.96s
user 0m 0.01s
sys 0m 0.15s
$ date && time curl -s http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics | wc
Wed Apr 7 22:09:09 UTC 2021
294381 591129 46625296
real 0m 1.97s
user 0m 0.05s
sys 0m 0.12s
$ date && time curl -s http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics | wc
Wed Apr 7 22:09:15 UTC 2021
294382 591131 46625426
real 0m 2.05s
user 0m 0.02s
sys 0m 0.14s
$ date && time curl -s http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics | wc
Wed Apr 7 22:09:22 UTC 2021
294382 591131 46625458
real 0m 2.07s
user 0m 0.03s
sys 0m 0.13s
$ date && time curl -s http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics | wc
Wed Apr 7 22:09:30 UTC 2021
294382 591131 46625458
real 0m 2.00s
user 0m 0.02s
sys 0m 0.13s
metricbeat deployment is not bounded by memory or cpu:
name: metricbeat
resources:
limits:
memory: 2Gi
requests:
cpu: 100m
memory: 100Mi
$ k top pod -n kube-system metricbeat-756d654796-ws4dc
NAME CPU(cores) MEMORY(bytes)
metricbeat-756d654796-ws4dc 1285m 400Mi
All other state_* metricsets ingest without issue every 10seconds.
Questions:
- Is there a known upper limit to the number containers that can be monitored in a 10second interval?
- Why would the number of metricbeat container_state events fluctuate between 4000-6000?
- How can the container_state process be scaled to handle +2000 containers?