Hello,
I'm playing with ELK/Beats to learn and see what is out there. I deployed metricbeat and I can confirm that data is making its way to ELK (In particular I send info to logstash). I'm in kibana UI and I see information there but surprisingly the information is not what I expect.
I was running load in the box to get:
0% CPU usage
50% CPU Usage
100% CPU Usage
Or in other words, 0 Full CPU's used, 1 Full CPU used or 2 Full CPU's used. I put this load purposely to confirm if Kibana was going to show the number properly.
I plotted a kibana visualization of system.processes.cpu.total.pct and also of system.cpu.total.pct. Somehow I see values of 0, 1 and 2 in the graphs, this is unexpected because I instead I was expecting values of 0%, 50% and 100%.
So these metrics are not saying what total percentage of CPU is busy (0 to 100%), instead they say how many CPU's are used (0, 1 2). This is wrong and unexpected. I checked many other CPU related KPI's around and is the same luck.
My OS is CentOS Linux release 7.4.1708 Kernel Linux apache 3.10.0-693.el7.x86_64 .
The version of ELK is 6.2.4 (the latest)
I have 2 CPU's:
[root@apache metricbeat]# cat /proc/cpuinfo |grep -i proc
processor : 0
processor : 1
[root@apache metricbeat]#
And among other tests, when I made 1 CPU busy, VMSTAT was showing 50% CPU idle (50% busy):
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 1007640 2116 551528 0 0 0 0 1080 93 50 0 50 0 0
1 0 0 1007640 2116 551528 0 0 0 0 1079 88 49 1 50 0 0
1 0 0 1007640 2116 551532 0 0 0 0 1091 113 50 0 50 0 0
1 0 0 1007640 2116 551532 0 0 0 0 1078 89 50 0 50 0 0
1 0 0 1007640 2116 551532 0 0 0 3 1079 93 50 0 50 0 0
1 0 0 1007640 2116 551532 0 0 0 0 1071 84 49 0 50 0 0
However the output in the console for metricbeat was showing:
2018-05-10T20:43:49.764-0700 DEBUG [logstash] logstash/async.go:142 17 events out of 17 events sent to logstash host 192.168.1.109:5443. Continue sending
2018-05-10T20:43:58.763-0700 DEBUG [publish] pipeline/processor.go:275 Publish event: {
"@timestamp": "2018-05-11T03:43:58.762Z",
"@metadata": {
"beat": "metricbeat",
"type": "doc",
"version": "6.2.4"
},
"metricset": {
"name": "cpu",
"module": "system",
"rtt": 156
},
"system": {
"cpu": {
"cores": 2,
"nice": {
"pct": 0
},
"softirq": {
"pct": 0.001
},
"user": {
"pct": 0.9945
},
"idle": {
"pct": 0.9965
},
"irq": {
"pct": 0
},
"iowait": {
"pct": 0
},
"steal": {
"pct": 0
},
"system": {
"pct": 0.008
},
"total": {
"pct": 1.0035
}
}
},
"beat": {
"version": "6.2.4",
"name": "apache",
"hostname": "apache"
}
}
If you see, the raw data itself was showing Total pct of 1.0035 which is wrong.
So looks like somewhere a factor of 50 is missing (1.0035% * 50) = expected 50% usage.
I noticed Need to understand metricbeat cpu metrics, looks like somebody is seen same behavior.
Can you please point me what to do here?
Many thanks
Luis