Version: 8.9.0
Hi there! Working on a SOC team where we manage agents for lots of local endpoints within our network. I'm pretty new to working with the stack, so I am hoping someone with more experience with agent issues can help out. An issue we've been working around for a while is that periodically (maybe after a day or so of being successfully enrolled/sending logs to our stack), agents will go offline and never come back on unless they are reinstalled, and when they are they just start working fine again until they decide they want to go offline. Went into the endpoints (one enrolled and working, one enrolled but not working) and executed .\elastic-agent.exe status
under the C:\Program Files\Elastic\Agent
directory and got the following outputs:
The healthy/working agent outputs:
┌─ fleet
│ └─ status: (HEALTHY) Connected
└─ elastic-agent
└─ status: (HEALTHY) Running
The enrolled but broken/offline agent outputs:
Error: failed to communicate with Elastic Agent daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing open \\\\.\\pipe\\elastic-agent-system: The system cannot find the file specified." For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.9/fleet-troubleshooting.html