Endpoint agent consistent 90+% CPU for some PCs

Hi,

I have recently been rolling our the Elastic Endpoint Agent to some clients for testing.
As part of the policy they are pushed Endpoint Security, System and Windows.

For certain clients they are getting consistently high CPU usage from Elastic Endpoint, Filebeat and Metricbeat.

Given the high utilisation across all three apps my assumption was an environmental issue, however I have confirmed that the endpoints could communicate with Elasticsearch for well over a 24 hour period. I have also confirmed that for this time there was adequate storage on the cluster to accept the incoming data.

Is there anything else that could be affecting the agents not being able to ship the relevant logs ?
Can you also provide a way for me to troubleshoot this or gather the relevant logs to provide to yourselves ?

Thanks in advance.

Hi @The1WhoPrtNocks, thanks for trying out Elastic Security.

Do you see data from Endpoint, Filebeat, and Metricbeat being stored in Elasticsearch? The most likely cause for the high CPU use is some activity on your computer that these three are trying to monitor. By chance, are you running them on one of the Elasticsearch nodes?

A first step to triage this and help narrow down what is causing the CPU spike is to try turning off features and see when CPU use drops. For Endpoint, can you edit the Endpoint Policy for the Endpoint on the Administration page in the Security App? Try turning all features off (i.e. turn off Malware prevention and turn off all event collection). CPU use for Endpoint should hopefully drop to near zero. Then try turning on features one-by-one (I'd start with events, then move to re-enabling malware detection/prevention) and see what feature causes CPU to spike. Once we know what is causing the high CPU use we can work toward a workaround.

Hi @ferullo,

Sorry for the silence, no we are not running it on any elasticsearch nodes, they are all on "generic" endpoints/laptops.

I believe through some further testing we have isolated it down to the endpoint/security policy, I will go though your suggestions on disabling certain features for it and see how it improves.

Hi @ferullo,

It seems that it was the malware detection option. Disabled that an all users issues have not shown since. We are not planing on using that feature currently, although are considering Full Endgame in the future.

So for me I would mark this as resolved.

Thanks

He's not alone and is correct on Endpoint being the issue.

I have yet to find a single process that would be a direct cause it happens on dozens of machines with different configurations. They will all do it at some point in time. Would love to see a CPU limit switch like pretty much all modern AV has now. Even Windows defender can be set to a max percentage used when doing a file level scan. Process driven well that's a little harder...

The biggest thing I've noticed is when TIWorker kicks off CPU will jump with it. It's not every time mind you it's almost like it's tied with when the Microsoft malware removal tool runs with the updates. It also hates the windows certificate authority servers for some reason. If you happen to try out Elastic Endpoint on a machine running streamlabs OBS you drop to 1FPS on recording as soon as you disable it jumps to normal. Just my 2 cents on what I've seen.

1 Like

@ferullo @PublicName

I would second a CPU throttling option, nice catch-all solution.

1 Like

@The1WhoPrtNocks and @PublicName I'd like to attempt to reproduce this issue in a development environment to better understand the issue and come up with a workaround. What version of the Endpoint are you using? This information can be found in Kibana at Security -> Administration -> Endpoints.

7.9.1 to 7.9.3 and 7.10.1-2

If you run the Oct Dec or Jan updates for windows it triggers 90% of the time. If I don't forget I'll screen shot it for you and you can clearly see TIWorker and Endpoint both sucking 50% CPU which then causes your applications to starve. With 7.9.3 it's better as at least now it's not run away memory usage like it was in 7.9.0.

Hello @PublicName. It sounds like you may be running multiple endpoints. We introduced performance improvements in 7.10 to address antimalware CPU utilization. Can you confirm that you experienced the TiWorker.exe CPU issue on 7.10.2?

I can confirm that 7.10.2 still has it. I'll snip it when I find a machine that needs updating. Testing fleet is rather large so I'll find one sooner or later. Hoping in the next few hours I can give you something. I'm hoping it triggers the windows malware removal tool that's when you really see it.

7.10.2 is much better in dozens of ways! Still has some annoyances but so very close.

7.10.2 @ 4:21 PST 2/1/2020

This is common on all Windows platforms 7,8,8.1,10. Can also be triggered with DISM but not in a reliable manner.

Can also be triggered with Disk Cleanup and selecting the option to remove windows update files.

Hi,

Sorry for going quiet on here, all the agents we have pushed out are 7.10.1 .
I have not found a consistent factor as to whats causing this, however i will go back and try and get more info.

I ran some updates on an old Windows 10 machine that was sorely out of date. The Elastic Endpoint records all file change events that happen on the system. Using these logs and Lens, I was able to find which processes were responsible for all the file activity within the suspected time window. I will provide a screen shot below of the results. As suspected the Windows Update service is responsible for creating quite a bit of file write activity on my host machine. To create a query I went to Visualize -> Create. In the Search I entered host.hostname : "TARGET_HOSTNAME". Then click Add Filter and add event.type: is one of change, deletion, creation.

It might be worth creating a Trusted Application entry to help avoid processing some of these windows update file writes. Trusted Applications are located at Security > Administration > Trusted applications. We suggest adding the full path to TiWorker.exe as well as the signature (Microsoft Windows).

We do not recommend putting svchost.exe on the Trusted Application list as it can open you up to security vulnerabilities. Another important note is that this was all done with the 7.11 Agent and Endpoint. We are continually adding features to allow users to tweak their individual installations.

Hi Matt,

thanks for this feedback, in particular your approach to diagnosing this. Can you please clarify what hashing algorithm is expected in the trusted application hash section. I do not seem to be able to get the "signature" option to appear.

Thanks in advance

The input box will accept any md5, sha1, or sha256 file hash. These are the three options I have when using a 7.11 installation.

Thank you for clarifying the excepted hashes.
Our dev version is currently V 7.10.2 , which explains the missing option.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.