Can metricbeat drill into svchost.exe process to get resource usage for specific services?


(Jean-Pierre) #1

On windows, many and multiple processes are run under svchost.exe. Sometimes a seperate excutable is run, but other times it's a windows internal component like a DLL. Currently, it's not documented if it's possible to get metrics for internal DLL components and threads. It is however fairly easy in the normal case where a service runs an exe.

Examples

1) exe

To look into the Winlogbeat service, the following registry key is applicable:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\winlogbeat

And key of interest

ImagePath = "C:\Program Files\Winlogbeat\\winlogbeat.exe" -c "C:\Program Files\Winlogbeat\\winlogbeat.yml" -path.home "C:\Program Files\Winlogbeat" -path.data "C:\\ProgramData\\winlogbeat"

For metric beat, it would be as simple as using this in the config

metricbeat.modules:
- module: system
processes: 
-  'winlogbeat'

2) DLL

To look into the Windows Remote Management Service usage, for example, the following registry key is applicable:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\WinRM

From here, two registry key values are of interest:

ImagePath = %SystemRoot%\System32\svchost.exe -k NetworkService
Description = @%Systemroot%\system32\wsmsvc.dll,-102

So in order to know how many resources the WinRM service uses, one needs to drill into it further.

PS > Get-Process svchost | Select-Object -Property ProcessName,Id,Threads,Modules -ExpandPropert
y Modules | Where-Object {$_.ModuleName -eq 'wsmsvc.dll'} | Format-List


ProcessName       : svchost
Id                : 452
Threads           : {448, 392, 520, 492, 564, 1144, 1164, 1192, 1388, 1792, 1908, 1920, 1988, 1256, 1380, 2312, 47776,
                    41092, 46888, 45644}
Modules           : {System.Diagnostics.ProcessModule (svchost.exe), System.Diagnostics.ProcessModule (ntdll.dll),
                    ...
                    System.Diagnostics.ProcessModule (wsmsvc.dll), System.Diagnostics.ProcessModule (miutils.dll),
                    ...
                    System.Diagnostics.ProcessModule (wevtfwd.dll), System.Diagnostics.ProcessModule (CRYPTNET.dll),
                    System.Diagnostics.ProcessModule (wecsvc.dll), System.Diagnostics.ProcessModule (FirewallAPI.dll),
                    ...}
ModuleName        : wsmsvc.dll
FileName          : c:\windows\system32\wsmsvc.dll
BaseAddress       : 140736241139712
ModuleMemorySize  : 2625536
EntryPointAddress : 140736243099904
FileVersionInfo   : File:             c:\windows\system32\wsmsvc.dll
                    InternalName:     WsmSvc.dll
                    OriginalFilename: WsmSvc.dll.mui
                    FileVersion:      6.3.9600.16384 (winblue_rtm.130821-1623)
                    FileDescription:  WSMan Service
                    Product:          Microsoft® Windows® Operating System
                    ProductVersion:   6.3.9600.16384
                    Debug:            False
                    Patched:          False
                    PreRelease:       False
                    PrivateBuild:     False
                    SpecialBuild:     False
                    Language:         English (United States)

Site              :
Container         :
Size              : 2564
Company           : Microsoft Corporation
FileVersion       : 6.3.9600.16384 (winblue_rtm.130821-1623)
ProductVersion    : 6.3.9600.16384
Description       : WSMan Service
Product           : Microsoft® Windows® Operating System

At this point, how to get metricbeat to select the CPU/Memory/IO usage by just the WsmSvc.dll module is unclear. It's not a process name attribute. It has to be filtered in another way.

I know that SysInternal ProcessExplorer is able to get some info. However, it get's very messy. Here's a rough list of tasks on how to do it.

  1. Search all svchost processes and find the process(es) with the module / DLL of interest, eg wsmsvc.dll.
  2. From the PID(s) found in step one, get the memory locations, e.g. BaseAddress, ModuleMemorySize, etc, to know which memory range(s) the module is loaded into
  3. From the PID(s) find the threads that have entrypoints within the address range from step 2
  4. Collect and aggregate the resource usage for those threads

Given the above complexity, I'm assuming metric beats doesn't cater for this? At least not yet?

Workaround / Simplification

Configure the service to run in it's own isolated svchost process and then get metrics from just that?

Sc config <service name> type=own

However, this may have unintended consequences? E.g. suppose services interact, e.g. Windows event collection depends on WinRM.

  • wsmsvc.dll (WinRM / Windows Remote Management)
  • wecsvc.dll (Windows Event Collector)

Splitting these two will force separate process memory spaces, and performance advantages from shared memory and threading, etc would be lost?


(Andrew Kroh) #2

Yeah this seems like it would be possible. It could get CPU usage per thread using GetThreadTimes, but you cannot get per thread memory usage since mallocs will be tracked per process.

Here's some pseudo code that answers the question of what module each thread is operating in (I used it to think through the problem). It's based on the algorithm you gave above.

s = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, pid)
while iterating Thread32First/Thread32Next
  t = OpenThread()
  NTQUERYINFOMATIONTHREAD = NtQueryInformationThread(t)
  threadAddr = // Get the start address of the thread from the NTQUERYINFOMATIONTHREAD data.
   
  // Get the modules loaded by the process.
  modules = CreateToolhelp32Snapshot(TH32CS_SNAPMODULE, pid)
  while iterating Module32First/Module32Next
    if threadAddr is within the address space of the module
    then aggregate this thread's CPU times with other threads in the same module

In general I think it would be nice to optionally provide more process details (like what DLLs are loaded, what privileges the process token has, etc). If you are interested in contributing some Windows enhancements to the project I would be happy to discuss it more.


(Jean-Pierre) #3

Interested, but unfortunately only somewhat comfortable with PowerShell/bash/python, so working in golang plus windows C++ internals will take some serious doing for me. That said, I am motivated to get some metrics on how much CPU wsmsvc.dll and wecsvc.dll threads consume resources for Windows Event Log collection.

For now, at least i can contribute some background research.

TL;DR:

  • An extra tricky step that is needed is walking the stack of the thread to find points at which it executes within the modules address space

    • the thread's start address won't necessarily begin within the DLL modules memory space, even if it is loaded to run code intended to run functions in the DLL?
    • In addition, ASLR affects this?
  • For the most part, I think I agree with #2473 and maybe metricbeat should factor in using windows PDH?

In a nutshell, the use case it to try get performance counters for the threads related to specific DLLs run in a windows service (but could be extended to arbitrary processes, not just services).

I tried implementing the pseudo code in powershell as PoC, but fell short because powershell doesn't have functions (that I know of) to walk the stack of a thread and find symbols (function entry points) related the the service DLLs. So unable to select the correct threads to capture performance counters for.

Thread memory use

Sure, trying to account thread memory use is usually pointless. One of the benefits of threads is that threads share a process' memory space efficiently (while processes need to use IPC or special memory mapping functions with shared memory objects). That said

  • While threads share the program, data and heap process space, they consume their own stack space?
  • So reporting on thread stack space use could find run-away / bad threads (e.g. ones calling a deep recursive function)?

Ways to gather performance info on Windows

I did a fair amount of googling to find other approaches/examples before accepting the DLL vs thread memory address pointer hack would be appropriate. So far, I haven't found a neat example for this use case, but options to pull together the bits needed are either direct calls to OS debug helper functions, PDH (Performance Data Helper) or PSAPI (Process Status API).

More examples that use PDH:

GOOS / gosigar

Skimmed the beats repo and assume sigar_windows.gois where the thread/DLL related code would need to be added?

Noticed it doesn't use PDH or PAPI. Perhaps the above is simpler and the pseudo code you suggested fits with this already implemented model- which is using windows process debug functions related to process snapshotting to do thread and module walking as far as I understand?

Nonetheless, this would explain why sigar_windows.go has a lot of checks for debug privileges. And even if PDH might not need debug priv, for this use case, its probably required.

The one good working example I've seen tracing threads to DLLs is SysInternals ProcessExplorer. On Stack Overflow, I read a post suggesting IDebugSymbols::GetNameByOffset method was used.


(Jean-Pierre) #4

PDH Examples

PDH for example might be a better option to consider because:

  • Splunk uses it to Monitor Windows performance, and I'm assuming thier approach woul've been battle tested.

  • In future, metric beats would be able to consume performance monitoring information from windows programs that use the API, e.g:

  • When using PDH, we might not need debug level privileges

However, getting modules/loaded DLLs for a process is a privileged operation, and PDH itself doesn't expose this.

Useful looking example query strings for PDH related to the use case:

  • process time \Process(processname)\% Processor Time
  • thread time \Process(processname)\% Processor Time
  • resident memory: \Process(processname)\Working Set
  • IO (network and storage): \Process(processname)\IO Data Bytes/sec

Some basic wild-carding is supported and it has got process ID and thead ID info

Other functions

Some other functions that relate or do similar things

For [NtQueryInformationThread](https://msdn.microsoft.com/en-us/library/windows/desktop/ms684283(v=vs.85) .aspx)

For PSAPI Module Information


(Maddin2016) #5

Actually I'm writing on a beat which uses PDH to access performance counters because I have the same problem :slight_smile:. Maybe in the next days I can open a PR. But it's very at the beginning.


(Andrew Kroh) #6

@JPvRiel Thanks for taking the time to research and condense the information on the various APIs for monitoring. It's very useful.

Yes.

The github.com/elastic/gosigar project is where we have been putting most of the system specific code. The sigar interface tries to generalize how common system info like CPU or disk usage is collected and hide the system specific implementation details.

I think that we can add features to the project that are not necessary exposed through the sigar interface. These methods can just be used directly by projects like Metricbeat. For example we could add functions for the PDH API to github.com/elastic/gosigar/tree/master/sys/windows and then add a new PDH metricset to Metricbeat.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.