After upgrading from 8.12.1 to 8.13.0 the endpoint service on Windows Server 2022 machines is constantly crashing and restarting. This is not happening on Server 2016 or 2019 machines. The agent output is to elasticsearch.
Looking through the endpoint logs there aren't any messages to indicate why the service crashed, and the crash appears to happen at different spots in the program.
Here's the logs at the time one crash happened, notice the pid changes indicating a service restart
{"@timestamp":"2024-03-29T18:32:05.7618361Z","agent":{"id":"8dc34bcb-c361-4469-9c3f-1dc6ded7f4a5","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"debug","origin":{"file":{"line":1095,"name":"AgentComms.cpp"}}},"message":"AgentComms.cpp:1095 Channel connectivity state: 2","process":{"pid":10416,"thread":{"id":5628}}}
{"@timestamp":"2024-03-29T18:32:06.7635506Z","agent":{"id":"8dc34bcb-c361-4469-9c3f-1dc6ded7f4a5","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"debug","origin":{"file":{"line":1095,"name":"AgentComms.cpp"}}},"message":"AgentComms.cpp:1095 Channel connectivity state: 2","process":{"pid":10416,"thread":{"id":5628}}}
{"@timestamp":"2024-03-29T18:32:23.642367Z","agent":{"id":"","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":226,"name":"Logging.cpp"}}},"message":"Logging.cpp:226 Endpoint info: version: 8.13.0, compiled: Wed Mar 20 21:00:00 2024, branch: HEAD, commit: f90579240155fc17f659ed37f7864ab1194ed2ea","process":{"pid":3348,"thread":{"id":3148}}}
Then here's the logs at another crash:
{"@timestamp":"2024-03-29T18:39:23.7973516Z","agent":{"id":"8dc34bcb-c361-4469-9c3f-1dc6ded7f4a5","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"debug","origin":{"file":{"line":257,"name":"Utility.cpp"}}},"message":"Utility.cpp:257 Document logging directory is: C:\\Program Files\\Elastic\\Endpoint\\state\\documents","process":{"pid":3348,"thread":{"id":7820}}}
{"@timestamp":"2024-03-29T18:39:23.7976467Z","agent":{"id":"8dc34bcb-c361-4469-9c3f-1dc6ded7f4a5","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"debug","origin":{"file":{"line":358,"name":"DocumentLogging.cpp"}}},"message":"DocumentLogging.cpp:358 Document logging directory size: 110656","process":{"pid":3348,"thread":{"id":7820}}}
{"@timestamp":"2024-03-29T18:39:24.4095944Z","agent":{"id":"8dc34bcb-c361-4469-9c3f-1dc6ded7f4a5","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"debug","origin":{"file":{"line":1095,"name":"AgentComms.cpp"}}},"message":"AgentComms.cpp:1095 Channel connectivity state: 2","process":{"pid":3348,"thread":{"id":5936}}}
{"@timestamp":"2024-03-29T18:39:39.7810079Z","agent":{"id":"","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":226,"name":"Logging.cpp"}}},"message":"Logging.cpp:226 Endpoint info: version: 8.13.0, compiled: Wed Mar 20 21:00:00 2024, branch: HEAD, commit: f90579240155fc17f659ed37f7864ab1194ed2ea","process":{"pid":540,"thread":{"id":7828}}}
There's appears to be no consistency to what is causing the service restart. Memory and CPU usage for the service appears normal at the time of the crash.
Uninstalling and reinstalling endpoint does not fix the issue. Changing the agent to a different fleet policy, without endpoint, then changing back to a policy with endpoint also does not fix the issue. A reboot of the server also does not fix the issue.
Anyone else seeing this issue?