Metricbeat Windows Perfmon Module Inconsistent Metric Intervals


I am using metricbeat to monitor a Windows box, and I have noticed that Windows Permon metrics is not consistent. I have it configured to every 60s, but I am noticing that there are gaps in the data in elasticsearch. The gap ranges anywhere from 2 minutes - 5 minutes between documents.

Other modules (such as system and windows service) from this machine are working fine though as I can see them being reported every minute.

Looking at the logs for metricbeat I am not seeing any errors or warnings.

Metricbeat version: metricbeat-6.1.3
Operating System: Windows Server 2016

- module: system
  enabled: true
  period: 60s
  processes: ['.*']
  cpu_ticks: false
    - cpu
    # - load  does not work on windows
    - filesystem    # can only report on and monitor the C: drive. Does not work with network drives
    - memory
    - network
    - process
    - core
    - diskio        # does not seem to work on windows as it reports only 0s
    - fsstat
    # - socket  does not work on windows
    - process_summary
    - uptime
    - drop_event.when.regexp.mount_point: '^/(sys|cgroup|proc|dev|etc|host)($|/)'

# To get a complete list of possible performance counters, run the following
# command on the windows machine you wish to monitor in CMD:
#   $ C:>TypePerf –q > counters.txt
- module: windows
  metricsets: ["perfmon"]
  period: 60s
    - instance_label: ""
      instance_name: "Total"
      measurement_label: ""
      query: '\Processor Information(_Total)\% Processor Time'
    - instance_label: ""
      measurement_label: "diskio.write.bytes"
      query: '\PhysicalDisk(*)\Disk Writes/sec'
      format: "long"
    - instance_label: ""
      measurement_label: ""
      query: '\Process(*)\IO Write Bytes/sec'
      format: "long"

- module: windows
  metricsets: ["service"]
  period: 60s

  enabled: true
  hosts: [...]
  topic: '...'
  version: ""
  required_acks: 1
  client_id: "..."
  worker: 1
  max_retries: 3
  bulk_max_size: 2048
  timeout: 30s
  broker_timeout: 10s
  channel_buffer_size: 256
  keep_alive: 0
  compression: snappy
  max_message_bytes: 1000000
    reachable_only: true
    refresh_frequency: 10m
    retry.max: 3
    retry.backoff: 250ms

  level: warning
  to_files: true
  to_syslog: false

Hello @chuhuethaoattv,

I think the module hits an error and doesn't send the current metric, It was fixed in a more recent version of metricbeat see the following PR for more details.

I suggest you update to metricbeat 6.3.0 and see if it fixes the problem.

Thank you @pierhugues,

That helped me to track down the issue. Somewhere along the way the error message below was being thrown. Do you know if there is a thread/issue for the error below? I tried searching but I could not find anything.

failed reading counters: 1 error: The returned data is not valid.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.