Can you look at Endpoint's logs to see if there's anything in them that shows why it can't connect to Agent. From what you've shared, I assume that's the issue. On Linux the logs will be in /opt/Elastic/Endpoint/state/log/
Look for logs from files with the word "Agent" in their name and those surrounding them from the same thread (process.thread) (e.g. AgentComms.cpp, AgentConnectionInfo.cpp, AgentContext.cpp, etc). Hopefully there are some logs that make it clear what the issue is. If it's not clear what's wrong by looking at them feel free to share the relevant (redacted as necessary) logs here, or if you have concerns doing so I don't mind if you DM them to me. Assuming the issue is the Agent<->Endpoint connection failing to establish you should see repeated logs each time Endpoint tries to connect to Agent.
As an aside, starting in 8.7 if you ran Endpoint's diagnostic command (sudo /opt/Elastic/Endpoint/elastic-endpoint diagnostics) in the resulting zip file you'd get an analysis.txt file that contains some details about what might be happening in this case.
Thank you for the quick reply! @ferullo I requested that the system owner pull these logs and they reported that the /opt/Elastic/Endpoint/state/ directory is empty.
We will be upgrading to version 8.8 in the next few weeks.
That's surprising. Perhaps Agent is unable to install Endpoint. Does the directory /opt/Elastic/Endpoint exist at all? Bear in mind Endpoint's directory requires root permission to access, so if the user tried to view /opt/Elastic/Endpoint/state/log/ from a non-root process they won't be able to access it.
When I look at Agent logs in Elasticsearch for a given host after the log "check if endpoint service is installed" I see logs of logs with FILENAME.cpp in them. Those logs are from Endpoint. Assuming Endpoint is not installing, do any of them help identify why it fails to install?
The user did try to access /opt/Elastic/Endpoint/state/log/ while using root.
This environment has Log collection is disabled which is limiting what I can see from Elasticsearch.
I am seeing the following data under the agent details page in fleet, if this is any help:
Thanks for those screenshots. It looks like the files in /opt/Elastic/Endpoint, which are created when Endpoint is installed, exist but those that would be created when Endpoint runs for the first time do not exist. So I think we need to figure out why Endpoint is not running after being installed.
The same Endpoint executable is used to install Endpoint as to run it after installation, so since Endpoint is successfully installing those files we can tell that the Endpoint executable is able to run on this host. There is just something preventing it from running after it's been installed.
What type of system is this? What's the Linux distribution and version? Is SELinux enabled? Are there any other system restrictions or any other security software that might prevent Endpoint from running?
Is the Endpoint service running? (Endpoint supports systemd and upstart. The systemd status command is sudo systemctl status ElasticEndpoint). You can also look for a process with the command line /opt/Elastic/Endpoint/elastic-endpoint run)
If the service isn't running, can you start it? (sudo systemctl start ElasticEndpoint)
If you can't start the service, what happens if you run Endpoint manually (sudo /opt/Elastic/Endpoint/elastic-endpoint run --log stdout --log-level DEBUG)
That will run Endpoint in the foreground. With that command you'll see a few log messages before Endpoint switches from using the command line log configuration to using the logging configuration in /opt/Elastic/Endpoint/elastic-endpoint.yaml . So it's to be expected that logging stops after a few messages if Endpoint keeps running.
If that doesn't work, what happens if you copy /opt/Elastic/Endpoint/elastic-endpoint to /opt/Elastic/Agent/elastic-endpoint and run it from there? There might be a system policy preventing Endpoint from running from within it's own directory but not from Agent's. When Agent runs Endpoint in install mode it launches elastic-endpoint from within it's own directory (using an identical binary with the name endpoint-security)
Are there any system syslog messages that highlight what is happening?
Great, I'm glad to hear it. Just to make sure its clear to you and anyone else reading this, although you can run Endpoint out of /opt/Elastic/Agent/ you shouldn't long term. Updating the system policy so Endpoint can run from it's correct location is the long term solution.
It might seem an attractive option to just update Endpoint's upstart/systemd service to point to the /opt/Elastic/Agent/ location elastic-endpoint was copied to. However, Endpoint's lifecycle is fully managed by Agent; Agent can reinstall Endpoint at any time updating that service configuration.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.