Metricbeats Windows metricset Invalid data

Hi-
I'm using Windows metricset to collect perfmon data, but I noticed some unreal data for couple of counters which I noticed this in the wildcard query (*), where as value of _Total shows 345 but for a single process the value shows beyond 700.
And also when I execute the typeperf query on the server, the values does show as '0' continuously, Is there any other how could I debug this issue ?

And also, I created the dataset in perfmon and dumped into csv for the same process, it shows as '0'. but only metricbeats shows huge value for the same process. :frowning:

@maddin2016, @andrewkroh - Any help on this ? Thanks !

Hi @paul1243,

Could you share the metricbeat configuration you are using for the perfmon metricset?

Hi @jsoriano-
Thanks for your reply.
Here is the configuration:

- module: windows
  metricsets: [perfmon]
  period: 5s
  perfmon.ignore_non_existent_counters: true
  perfmon.counters:
    - instance_label: network_interface_bytes_total_sec
      measurement_label: network.interface.bytes.total.sec
      query: '\Network Interface(*)\Bytes Total/Sec'
      
    - instance_label: process_user_cpu_usage_pct
      measurement_label: process.user.cpu.usage.pct
      query: '\Process(*)\% User Time'
    
    - instance_label: process_virtual_bytes
      measurement_label: process.virtual.bytes
      query: '\Process(*)\Virtual Bytes'
    
    - instance_label: process_privileged_time
      measurement_label: process.privileged.time
      query: '\Process(*)\% Privileged Time'  
      
    - instance_label: process_processor_time
      measurement_label: process.processor.time
      query: '\Process(*)\% Processor Time'

Hi @paul1243,

Could you enable debug logs for perfmon to check if something is logged about this? It can be also useful if you enable debug logs for publish to see some of the events that metricbeat is publishing.

To enable debug for perfmon use -d perfmon, and to enable both -d perfmon,publish.

Hi @jsoriano
Sorry for the late reply.
I have enabled the debug mode and captured the logs, I still see the value for a process far more than the actual _Total value, Here are the documents:

2018-12-23T14:02:19.510+0100	DEBUG	[publish]	pipeline/processor.go:308	Publish event: {
  "@timestamp": "2018-12-23T13:02:19.505Z",
  "@metadata": {
    "beat": "metricbeat",
    "type": "doc",
    "version": "6.4.0"
  },
  "beat": {
    "hostname": "Test_VM"
  },
  "agent": {
    "hostname": "Test_VM",
    "version": "6.4.0",
    "type": "metricbeat"
  },
  "host": {
    "name": "Test_VM"
  },
  "metricset": {
    "name": "perfmon",
    "module": "windows",
    "rtt": 5010
  },
  "windows": {
    "perfmon": {
      "process_processor_time": "csrss",
      "process": {
        "processor": {
          "time": 1270.8631545054614
        }
      }
    }
  }
}
2018-12-23T14:02:19.510+0100	DEBUG	[publish]	pipeline/processor.go:308	Publish event: {
  "@timestamp": "2018-12-23T13:02:19.505Z",
  "@metadata": {
    "beat": "metricbeat",
    "type": "doc",
    "version": "6.4.0"
  },
  "agent": {
    "type": "metricbeat",
    "hostname": "Test_VM",
    "version": "6.4.0"
  },
  "host": {
    "name": "Test_VM"
  },
  "metricset": {
    "module": "windows",
    "rtt": 5010,
    "name": "perfmon"
  },
  "windows": {
    "perfmon": {
      "process_processor_time": "csrss",
      "process": {
        "processor": {
          "time": 0
        }
      }
    }
  },
  "beat": {
    "hostname": "Test_VM"
  }
}

This is _Total document:

2018-12-23T14:02:19.509+0100	DEBUG	[publish]	pipeline/processor.go:308	Publish event: {
  "@timestamp": "2018-12-23T13:02:19.505Z",
  "@metadata": {
    "beat": "metricbeat",
    "type": "doc",
    "version": "6.4.0"
  },
  "host": {
    "name": "Test_VM"
  },
  "agent": {
    "type": "metricbeat",
    "hostname": "Test_VM",
    "version": "6.4.0"
  },
  "metricset": {
    "rtt": 4007,
    "name": "perfmon",
    "module": "windows"
  },
  "windows": {
    "perfmon": {
      "process_processor_time": "_Total",
      "process": {
        "processor": {
          "time": 397.7892293300842
        }
      }
    }
  },
  "beat": {
    "hostname": "Test_VM"
  }
}

@paul1243, how many cores you have? And can you compare to what the windows performance monitor is showing you.

I assume you have a 4 core machine. Because '\Process(_Total)\% Processor Time is calculated by 100 * number of logical processors. The one value for for csrss indicates that this Process is extensively using Multithreading and using the Processor to more than its capacity. See here for a good explanation.

1 Like

Hi @maddin2016-
The server has got 4 cores, and I noticed 3 csrss processes running, I've added them to the data collector set and dumped in to a CSV, results are as below:

"(PDH-CSV 4.0) (Romance Standard Time)(-60)","\\Test_VM\Process(csrss)\% Processor Time","\\Test_VM\Process(csrss#1)\% Processor Time","\\Test_VM\Process(csrss#2)\% Processor Time"
"12/29/2018 17:36:08.205"," "," "," "
"12/29/2018 17:36:10.173","0","0","0"
"12/29/2018 17:36:12.171","0","0","0"
"12/29/2018 17:36:14.164","0","0","0"
"12/29/2018 17:36:16.173","0","0","0"
"12/29/2018 17:36:18.174","0","0","0"
"12/29/2018 17:36:20.174","0","0","0"
"12/29/2018 17:36:22.170","0","0","0"
"12/29/2018 17:36:24.169","0","0","0"
"12/29/2018 17:36:26.168","0","0","0"
"12/29/2018 17:36:28.173","0","0","0"
"12/29/2018 17:36:30.168","0","0","0"
"12/29/2018 17:36:32.162","0","0","0"
"12/29/2018 17:36:34.165","0","0","0"

But at the same time beats perfmon data is publishing around 1200+ value for the same process. :frowning:
I collected the data of the csrss process thru 'typeperf' query, it also returns value as '0'.

I have added the separate queries for the multiple csrss processes in the beats perfmon module as below and tried, but even now I receive the data as '0'.

    - instance_label: process_processor_time_csrss
      measurement_label: process.processor.time.csrss
      query: '\Process(csrss)\% Processor Time' 
        
    - instance_label: process_processor_time_csrss1
      measurement_label: process.processor.time.csrss1
      query: '\Process(csrss#1)\% Processor Time'

    - instance_label: process_processor_time_csrss2
      measurement_label: process.processor.time.csrss2
      query: '\Process(csrss#2)\% Processor Time' 

I believe, we see this unreal data only when we run with the wildcard query.

@maddin2016 How do you see this ? Are there any other debugging techniques ? Please do let me know. Thanks !

Hi @maddin2016 - Any comments on my above reply ? Thanks !

Hi @paul1243, i've created a PR where i refactored wildcard queries. But is has to be approved still. Maybe in the meantime you can test with source from my repo if the behavior is the same or if now the correct data is showing.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.