Incorrect CPU usage and core count reading in metricbeat

I'm using system.cpu.total.norm.pct for average CPU usage but in some of the servers the reading is incorrect, which I guess is probably related to incorrect reading of system.cpu.cores

CPU usage reported by system.cpu.total.norm.pct is nearly double the value reported by performance counter or task manager.

Task Manager Data:
Sockets: 4
Cores: 64
Logical processors: 128

Metricbeat Reading:
system.cpu.cores: 32

Spec:
version: 7.2
Processor: Intel(R) Xeon(R) CPU E7-4850 v4 @2.10 Ghz (4 processors)
Windows: Windows Server 2016 Standard
Windows Build: 14393.3085

It's worth mentioning reading from windows module with performance counters is accurate.

 - instance_label: processor.name
   instance_name: Total
   measurement_label: processor.time.total.pct
   query: '\Processor Information(_Total)\% Processor Time'
1 Like

This is interesting, as that metric should be taken from Golang's NumCPU function. That 32 number would make sense if something somewhere was doing a naive 128/4. Do you have more example metrics from metricbeat you could show us?

Here is a complete JSON:

{
  "_index": "metricbeat-7.2.0-2019.08.13",
  "_type": "_doc",
  "_id": "9IORimwB6uDCkCJKZW6z",
  "_score": 1,
  "_source": {
    "@timestamp": "2019-08-13T10:41:19.281Z",
    "service": {
      "type": "system"
    },
    "system": {
      "cpu": {
        "softirq": {
          "pct": 0,
          "norm": {
            "pct": 0
          }
        },
        "steal": {
          "pct": 0,
          "norm": {
            "pct": 0
          }
        },
        "iowait": {
          "pct": 0,
          "norm": {
            "pct": 0
          }
        },
        "irq": {
          "pct": 0,
          "norm": {
            "pct": 0
          }
        },
        "nice": {
          "pct": 0,
          "norm": {
            "pct": 0
          }
        },
        "total": {
          "pct": 0.1186,
          "norm": {
            "pct": 0.0037
          }
        },
        "cores": 32,
        "user": {
          "pct": 0.0687,
          "norm": {
            "pct": 0.0021
          }
        },
        "system": {
          "pct": 0.0499,
          "norm": {
            "pct": 0.0016
          }
        },
        "idle": {
          "norm": {
            "pct": 0.9963
          },
          "pct": 31.8814
        }
      }
    },
    "tags": [
      ""
    ],
    "ecs": {
      "version": "1.0.0"
    },
    "host": {
      "os": {
        "platform": "windows",
        "version": "10.0",
        "family": "windows",
        "name": "Windows Server 2016 Standard",
        "kernel": "10.0.14393.3085 (rs1_release.190703-1816)",
        "build": "14393.3085"
      },
      "id": "8701d92f-4a53-4e49-99c4-8b8ec621786b",
      "hostname": "SRV100",
      "architecture": "x86_64",
      "name": "SERVER100"
    },
    "agent": {
      "type": "metricbeat",
      "ephemeral_id": "daff776e-3008-4b4d-b2a6-2030b0afbe65",
      "hostname": "SRV100",
      "id": "f1a0818b-fa3e-497a-a199-5d0dab26c1e5",
      "version": "7.2.0",
      "name": "SERVER100"
    },
    "event": {
      "dataset": "system.cpu",
      "module": "system",
      "duration": 654000
    },
    "metricset": {
      "name": "cpu"
    }
  },
  "fields": {
    "@timestamp": [
      "2019-08-13T10:41:19.281Z"
    ]
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.