Hi @jsu. Sorry you're hitting these problems. It seems like you may have two different issues.
The DEGRADED issue
Could you look in Endpoint's logs around the time a DEGRADED message appears in the Agent Activity Log to see if there is any indication what Endpoint was doing when this happened. Please make sure to adjust for any time zone issues you might have between the timestamps in Endpoint's logs and those in Kibana (if there are any).
In particular, I'm curious of the following:
- Is Endpoint is actively applying Policy or did it just apply Policy (you should see lots of logs with the string
"found in config"
in them). The bug that was fixed in 7.9.1 was caused by Endpoint being slow to keep it's heartbeat with Agent when applying Policy. - Is Endpoint is crashing. Endpoint's PID is in each log message so if it suddenly changes that is an indication it may be crashing. If the PID changes it could also be Elastic Agent restarting/reinstalling Endpoint if it feels Endpoint is very unhealthy. Systemd's syslogs would also show if Endpoint is crashing.
- Does Endpoint think it has an active connection to Agent when it becomes DEGRADED or does it know it is disconnected. Check the Endpont logs (
/opt/Elastic/Endpoint/state/log/
) to see it's status. Below are two commands I used locally.
vagrant@ubuntu:~$ sudo bash -c "grep 'AgentConnectionInfo.cpp' /opt/Elastic/Endpoint/state/log/*"
{"@timestamp":"2020-09-29T20:54:50.968161593Z","agent":{"id":"b5fe037e-4bc6-4644-94cf-3122c490db20","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":110,"name":"AgentConnectionInfo.cpp"}}},"message":"AgentConnectionInfo.cpp:110 Validated agent is root/admin","process":{"pid":5989,"thread":{"id":6000}}}
{"@timestamp":"2020-09-29T20:54:50.973433891Z","agent":{"id":"b5fe037e-4bc6-4644-94cf-3122c490db20","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":118,"name":"AgentConnectionInfo.cpp"}}},"message":"AgentConnectionInfo.cpp:118 Established stage 1 connection to agent","process":{"pid":5989,"thread":{"id":6000}}}
{"@timestamp":"2020-09-29T21:45:14.989756853Z","agent":{"id":"b5fe037e-4bc6-4644-94cf-3122c490db20","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":110,"name":"AgentConnectionInfo.cpp"}}},"message":"AgentConnectionInfo.cpp:110 Validated agent is root/admin","process":{"pid":5989,"thread":{"id":6000}}}
{"@timestamp":"2020-09-29T21:45:14.990314390Z","agent":{"id":"b5fe037e-4bc6-4644-94cf-3122c490db20","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":118,"name":"AgentConnectionInfo.cpp"}}},"message":"AgentConnectionInfo.cpp:118 Established stage 1 connection to agent","process":{"pid":5989,"thread":{"id":6000}}}
vagrant@ubuntu:~$ sudo bash -c "grep 'Agent connection' /opt/Elastic/Endpoint/state/log/*"
{"@timestamp":"2020-09-29T20:55:11.982879236Z","agent":{"id":"b5fe037e-4bc6-4644-94cf-3122c490db20","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":592,"name":"AgentComms.cpp"}}},"message":"AgentComms.cpp:592 Agent connection established.","process":{"pid":5989,"thread":{"id":6000}}}
{"@timestamp":"2020-09-29T21:45:36.5766533Z","agent":{"id":"b5fe037e-4bc6-4644-94cf-3122c490db20","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"info","origin":{"file":{"line":592,"name":"AgentComms.cpp"}}},"message":"AgentComms.cpp:592 Agent connection established.","process":{"pid":5989,"thread":{"id":6000}}}
vagrant@ubuntu:~$
The Elasticsearch connection is down issue
Can you follow the steps here Endpoint 7.9 "Degraded and dashboards" to see if you are able to connect to Elasticsearch with Endpoint's config information via Curl. There is no need to check the Kibana connection, in 7.9 Linux Endpoints do not have any reason to connect to Kibana.