Eventing only 8.13.3 W11 24H2 High CPU Load - EventsQueueThread

Elastic Agent v8.13.3 w/ Elastic Endpoint on Windows 11 Enterprise 24H2 (10.0.26100.3194) doing only event collection (protections disabled) shows a baseline of 8% CPU load, with periodic sustained spikes up to 78%.

The endpoint diagnostics metrics.json shows EventsQueueThread as the primary contributor to the load (details below). The system_impact attributes compared against two collections a day apart show Powershell or svchost as the highest overall events.

Trusted applications for the security solution are in place, and no overlap or conflicts are detected in a procmon capture.

This seems to result from general eventing load, but I am hoping to gain further insight into what EventsQueueThread is and how it might affect the system load.

Collected 1 day apart, the same thread appears as top load:

            "threads":
            [
                {
                    "cpu":
                    {
                        "mean": 4.842997495665575
                    },
                    "name": "EventsQueueThread"
                },
                {
                    "cpu":
                    {
                        "mean": 2.200709705534458
                    },
                    "name": "EventsQueueThread"
                },
=== Memory ===
Load: 45%
Physical: 32220 GB
Available: 17715 GB
Commit limit: 37340 GB

=== CPU ===
Name: Intel(R) Core(TM) Ultra 7 155U
Logical cores: 14

Please let me know if any additional data would be helpful.

Hi @t5r4e3. I'm sorry to hear you're having CPU issues.

You're probably right. That thread is responsible for async event enrichment. It enrichment events with information such as digital signatures and parent process information (plus a lot more). Under heavy load, this can require a notable amount of CPU, though 4.8% isn't bad.

Top Command

We created the top command to help diagnose issues like this.
Elastic Endpoint command reference | Elastic Security Solution [8.17] | Elastic

Here's additional documentation.

In addition to the local execution method described in the docs, you can run this from the Response Console with:

execute --command ""C:\Program Files\Elastic\Endpoint\elastic-endpoint.exe" top \-\-limit 5"

Divide and Conquer Policy

You can also try a divide and conquer approach.

  1. Disable all protections and all events. Verify CPU drops.
  2. Leaving all protections disabled, turn on the first 4 event sources.
  3. If CPU goes up, then one of them is likely the cause, so turn off half of the currently-enabled event sources and re-check CPU.
  4. If CPU remains low, then rule out all the currently-enabled event sources and enable half of those which have not yet been ruled out.
  5. Repeat steps 3 and 4 until you've narrowed it down to a single event source. This is basically a binary search. It should take 3 iterations to test all 8 event sources (4 -> 2 -> 1).

Note that enabling Behavioral Protections will cause Defend to internally collect and enrich all event sources. They are required for behavioral protections, even if they are never sent to Kibana.

System Impact

Since you're already looking at metrics documents, the week_ms fields under the system_impact portion of the metrics document can give you a sense of which processes' activity are causing Defend to process events. Ignore week_idle_ms. The top command analyzes this information over several seconds to generate its output.

1 Like