Elastic Agent - critical issues, filling up hard drive space

@ferullo

I think there might be critical bug in elastic agent - today all servers ( system drives were filled up ) server 2012, 2016, 2019, Win 10 machines as well.

Files in C:\Program Files\Elastic\Endpoint\state\log\ were occupying the space.

File names starting with name endpoint-000008.log ending with endpoint-003363.log

So 3363 files in the folder ( using about 80gb of space )

I tried bulk delete files, but files between 000008 to 0003357 were throwing access denied error, 003357 and above deleted just fine.

This sounds like exact same problem.

Agent version 7.16

Reboot seemed to correct the problem and most files went away, but I’m assuming this problem will re occur.

( My cluster was not reachable as well, but I don’t think that this would be normal behaviour regardless as agent should start dropping files before filling the drive )

Thanks

Well that's not good.

I tried to reproduce this with 7.16.2 but was not able to. I tested on Windows 10. After installing Agent and adding the Endpoint Security integration I set Agent's log level to debug (to speed up how fast that target directory filled up) then disabled networking for the host. I observed that the endpoint-*.log files properly aged off, unlike what you saw.

Are you positive the Agent and Endpoint Security versions were 7.16? Could you check the you check the output from "c:\Program Files\Elastic\Endpoint\elastic-endpoint.exe" version to be sure.

Can you able to reproduce this? If so, could you share the output from dir /S "c:\Program Files\Elastic\Endpoint" then depending on what that shows we can dig in more? You should never see more than about 300MB of files in c:\Program Files\Elastic\Endpoint\state\log. If you see more than that then you've reproduced it.