Looking at the "[Elastic Agent] Agent Info" dashboard, I noticed quite a number of hosts reporting a lot of errors in the "[Elastic Agent] Agents with Errors"
looking closer, seems /opt/Elastic/Endpoint/state/log/endpoint-000XXX.log gets spammed with log lines as following:
{"@timestamp":"2024-08-19T08:35:02.118765865Z","agent":{"id":"8d94b26b-7f18-41f8-8c83-e01184b52d95","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"error","origin":{"file":{"line":175,"name":"ProcFile.cpp"}}},"message":"ProcFile.cpp:175 Error parsing file /proc/1578/stat","process":{"pid":17357,"thread":{"id":27120}}}
and
{"@timestamp":"2024-08-19T08:35:22.387318797Z","agent":{"id":"8d94b26b-7f18-41f8-8c83-e01184b52d95","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"error","origin":{"file":{"line":701,"name":"ProcFile.cpp"}}},"message":"ProcFile.cpp:701 Unable to open smaps file for PID 10987","process":{"pid":17357,"thread":{"id":26569}}}
grep ProcFile.cpp:701 endpoint-000358.log | wc -l
10359
grep ProcFile.cpp:175 endpoint-000358.log | wc -l
41644
With that file just being about 1 hour old.
This is the host where the problem is most apparent, but there are man more. They mostly seem to be KVM hosts, Ubuntu 18.04.X LTS and 20.04.X LTS hosts.
I tried to catch some of the /proc//stat files, but they didn't exist (anymore). Is that just expected from short lived processes?
The dashboard is not overly helpful identifying other issues, with these SPAM messages piling up.
Sebastian