Perfmon counter: The returned value is not valid

Hai,

I am currently using metricbeat 7.9.0 in windows to monitor the cpu usage of it.

windows.yml:

    - module: windows
      metricsets: [perfmon]
      period: 20s
      perfmon.ignore_non_existent_counters: true
      perfmon.group_measurements_by_instance: true
      perfmon.queries:
      - object: 'Processor Information'
        instance: ['_Total']
        counters:
        - name: '% Processor Utility'

Result running typeperf in the windows:

    C:\Program Files\Metricbeat>typeperf "\processor information(_total)\% processor utility"

    "(PDH-CSV 4.0)","\\B-RDP\processor information(_total)\% processor utility"
    "09/04/2020 10:00:44.970","12.599742"
    "09/04/2020 10:00:45.970","12.262473"
    "09/04/2020 10:00:46.970","13.776128"
    "09/04/2020 10:00:47.986","15.459835"
    "09/04/2020 10:00:49.001","16.701235"
    "09/04/2020 10:00:50.017","16.010404"
    "09/04/2020 10:00:51.033","15.780566"
    "09/04/2020 10:00:52.033","15.211278"
    "09/04/2020 10:00:53.048","12.962887"
    "09/04/2020 10:00:54.048","11.272148"
    "09/04/2020 10:00:55.050","13.951118"
    "09/04/2020 10:00:56.052","15.423948"
    "09/04/2020 10:00:57.058","17.593544"
    "09/04/2020 10:00:58.074","14.035282"

In my elasticsearch cluster, metricbeat will fail to send that particular value from time to time.
The error reported when i run metribeat with full debug mode is as below:

perfmon/data.go:55 Counter value retrieval returned {"error": "The data is not valid.", "cstatus": "The returned data is not valid.", "perfmon": {"query": "\\Processor Information(_Total)\\% Processor Utility"}}

Additional graph plot to describe the behavior:

I have tried running metricbeat with other perfmon metrics, they all seems to work fine. (only this is causing trouble).

Anyone have idea what's going on here?

Appreciate any assist provided on resolving this.

Thanks.

Regards,
Song Lim

hi @sleepy, we recently added the CStatus information in the logs in order to provide more information when executing the PDH api calls.
For more information on the CStatuses you can have a look here https://docs.microsoft.com/en-us/windows/win32/perfctrs/checking-pdh-interface-return-values.
From what it looks you are hitting this scenario:

The counter was successfully found, but the data returned is not valid. This error can occur if the counter value is less than the previous value. (Because counter values always increment, the counter value rolls over to zero when it reaches its maximum value.) Another possible cause is a system timer that is not correct.

Hai,

Thanks for your information. I have tried to search CStatus error up and somehow can't get an idea on why it is happening.

Could you provide some insights on how/where could I debug it?

Thanks a lot.