Using container_cpu_usage_total in Elasticsearch

I'm collecting metrics from Kubernetes using Prometheus and sending them to ElasticSearch, with Metricbeat Prometheus module. (Federation | Prometheus).

So, I have the following field in ElasticSearch Index: prometheus.metrics.container_cpu_usage_total . And I'm trying to use the following dashboard (TSVB Kibana) to get the usage percentage of CPU:

enter image description here

I expected that using "Derivate" I have the same behavior that Prometheus Rate, but numbers returned in Dashboard no make sense for me, because is very different from kubectl top pods command.

For example: In kubectl top pods I got 234m (millicores) in my pod, and in dashboard, I got numbers that vary between 80 and 90.

Hi @Ronaldo_Lanhellas

First you could use counter rate and that does the max, derivative positive only all in one step.

Second you didn't pick a units 1s, 1m etc. Not sure what that defaults to.

Give that a try see how it looks.

I'm doing in this way:

This is what you suggested?

Yes, How does it Look?

That assumes prometheus.metrics.container_cpu_usage_total is a monotonically increasing counter value.

Also under panel options you should set Interval to >= to your collection interval example if you are collecting every minute set to >=1m

Also you can format the units etc if you want

I did everything that you posted but that value don't match with my kubectl top nodes command.

This is the return from kubectl top nodes:

Captura de tela 2021-05-13 163958

As you can see I have about 12% CPU usage, and now in my Kibana I have the following:

I got about 14% , sometime go to 19%, but continue 12% in kubectl top nodes. Really, I don't know what is correct, my kibana or my kubectl, I want to believe that my kubectl is correct.

Hmm, Not sure;

All Kibana is doing is taking the values from the prometheus exporter and doing the math to display... In my experience, It is often pretty hard to get exact comparisons from a CMD Line tool and and a metrics collections and visualizations. The Counter Rate in Kibana is pretty tested I use it all the time for network metrics and it is pretty solid (not to say there couldn't be an issue)

You might need to do some deep reading on How kubectl is displaying data vs how the data is collected and reported by prometheus. I suspect there is nuances... example just some nuance on the memory collection (not cpu) but similar.

Me? I would probably lean towards the prometheus collected metrics as that has such a wide user base, (assuming everything is configured correctly) and many folks monitor their K8s with prometheus collectors if there were issues I suspect they would be reported and fixed.

I don't have a K8s cluster up and running right now so I can not compare.

Perhaps someone else may chime in.

Thanks for your answer, do you think that I can use "Count Rate" as a percentage value ? Or I should use another kind of calculate to transform "Count Rate" to Percentage of usage ?

Ok let's back up a bit... I am / was not reading carefully .... apologies

I was focused on showing you the correct way to calculate a rate... What we just calculated was the Rate of container_cpu_usage_total i.e the rate of CPU consumption (AND we should have SUM up the rates with a Series agg for all the Containers anyways which we did not do yet), not the percent CPU which I now understand what you wan

So you want
A) The Total CPU Percent for All Containers Per Node?
B) The CPU Percent Per Container per Node?
C) Both?

So I think we need to do some re-thinking and yes you can do it, but it is going to take more work. (Of course this would most of this would be done for you automatically if you used metricbeat collection :slight_smile: )

I am not a prometheus expert looks like something like these would be the types calculations we would need to do..

Pick A) or B) above and then find the other fields like total_cores and the collection rate etc.

Then perhaps I can help. If it is for all container we will need to sum the container_cpu_usage_total if for each container then will need to be broken down by container.

Then it will looks something like this (This is not correct just and example) We will need to do a bucket script etc. and use the math from the Stack Overflow etc.

Thanks for your detailed answer, I will follow your advice.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.