Endpoint agent consistent 90+% CPU for some PCs

Hi,

I have recently been rolling our the Elastic Endpoint Agent to some clients for testing.
As part of the policy they are pushed Endpoint Security, System and Windows.

For certain clients they are getting consistently high CPU usage from Elastic Endpoint, Filebeat and Metricbeat.

Given the high utilisation across all three apps my assumption was an environmental issue, however I have confirmed that the endpoints could communicate with Elasticsearch for well over a 24 hour period. I have also confirmed that for this time there was adequate storage on the cluster to accept the incoming data.

Is there anything else that could be affecting the agents not being able to ship the relevant logs ?
Can you also provide a way for me to troubleshoot this or gather the relevant logs to provide to yourselves ?

Thanks in advance.

Hi @The1WhoPrtNocks, thanks for trying out Elastic Security.

Do you see data from Endpoint, Filebeat, and Metricbeat being stored in Elasticsearch? The most likely cause for the high CPU use is some activity on your computer that these three are trying to monitor. By chance, are you running them on one of the Elasticsearch nodes?

A first step to triage this and help narrow down what is causing the CPU spike is to try turning off features and see when CPU use drops. For Endpoint, can you edit the Endpoint Policy for the Endpoint on the Administration page in the Security App? Try turning all features off (i.e. turn off Malware prevention and turn off all event collection). CPU use for Endpoint should hopefully drop to near zero. Then try turning on features one-by-one (I'd start with events, then move to re-enabling malware detection/prevention) and see what feature causes CPU to spike. Once we know what is causing the high CPU use we can work toward a workaround.

Hi @ferullo,

Sorry for the silence, no we are not running it on any elasticsearch nodes, they are all on "generic" endpoints/laptops.

I believe through some further testing we have isolated it down to the endpoint/security policy, I will go though your suggestions on disabling certain features for it and see how it improves.

Hi @ferullo,

It seems that it was the malware detection option. Disabled that an all users issues have not shown since. We are not planing on using that feature currently, although are considering Full Endgame in the future.

So for me I would mark this as resolved.

Thanks

He's not alone and is correct on Endpoint being the issue.

I have yet to find a single process that would be a direct cause it happens on dozens of machines with different configurations. They will all do it at some point in time. Would love to see a CPU limit switch like pretty much all modern AV has now. Even Windows defender can be set to a max percentage used when doing a file level scan. Process driven well that's a little harder...

The biggest thing I've noticed is when TIWorker kicks off CPU will jump with it. It's not every time mind you it's almost like it's tied with when the Microsoft malware removal tool runs with the updates. It also hates the windows certificate authority servers for some reason. If you happen to try out Elastic Endpoint on a machine running streamlabs OBS you drop to 1FPS on recording as soon as you disable it jumps to normal. Just my 2 cents on what I've seen.

1 Like

@ferullo @PublicName

I would second a CPU throttling option, nice catch-all solution.

1 Like