Elastic agent 8.4.3 - No policy response available

Hello,

The Elastic UI displays the following error "No policy response available"

Status output:
./elastic-agent status

Status: FAILED 
Message: app endpoint-security--8.4.3-3a97d14d: Missed two check-ins 
Applications: 
* endpoint-security (FAILED)
                    Missed two check-ins 

Diagnostics output:

./elastic-agent diagnostics

elatic-agent    id: xxxx        version: 8.4.3 
                build_commit: xxxx build _ time: 2022-10-04 10:34:00 +0000 UTC snapshot_build: false 
Applications : 
  * name: endpoint-security route key: default 
     error: Get "http://unix/": dial unix /opt/Elastic/Agent/data/tmp/default/endpoint-security/endpoint-security.sock: connect: no such file or directory

The agent has been uninstalled and reinstalled with services confirmed to be running.
The agent is actively communicating with Fleet.

Hi @elastic_fan (great username :wink: )

Can you look at Endpoint's logs to see if there's anything in them that shows why it can't connect to Agent. From what you've shared, I assume that's the issue. On Linux the logs will be in /opt/Elastic/Endpoint/state/log/

Look for logs from files with the word "Agent" in their name and those surrounding them from the same thread (process.thread) (e.g. AgentComms.cpp, AgentConnectionInfo.cpp, AgentContext.cpp, etc). Hopefully there are some logs that make it clear what the issue is. If it's not clear what's wrong by looking at them feel free to share the relevant (redacted as necessary) logs here, or if you have concerns doing so I don't mind if you DM them to me. Assuming the issue is the Agent<->Endpoint connection failing to establish you should see repeated logs each time Endpoint tries to connect to Agent.

As an aside, starting in 8.7 if you ran Endpoint's diagnostic command (sudo /opt/Elastic/Endpoint/elastic-endpoint diagnostics) in the resulting zip file you'd get an analysis.txt file that contains some details about what might be happening in this case.

Thank you for the quick reply! @ferullo I requested that the system owner pull these logs and they reported that the /opt/Elastic/Endpoint/state/ directory is empty.

We will be upgrading to version 8.8 in the next few weeks.

That's surprising. Perhaps Agent is unable to install Endpoint. Does the directory /opt/Elastic/Endpoint exist at all? Bear in mind Endpoint's directory requires root permission to access, so if the user tried to view /opt/Elastic/Endpoint/state/log/ from a non-root process they won't be able to access it.

When I look at Agent logs in Elasticsearch for a given host after the log "check if endpoint service is installed" I see logs of logs with FILENAME.cpp in them. Those logs are from Endpoint. Assuming Endpoint is not installing, do any of them help identify why it fails to install?

Thank you for your help.

The user did try to access /opt/Elastic/Endpoint/state/log/ while using root.
This environment has Log collection is disabled which is limiting what I can see from Elasticsearch.

I am seeing the following data under the agent details page in fleet, if this is any help:

  "components": [
    {
      "id": "endpoint-default",
      "type": "endpoint",
      "status": "FAILED",
      "message": "Failed: endpoint service missed 3 check-ins",
      "units": [
        {
          "id": "endpoint-default-xxxxxxxx",
          "type": "input",
          "status": "FAILED",
          "message": "Failed: endpoint service missed 3 check-ins"
        },
        {
          "id": "endpoint-default",
          "type": "output",
          "status": "FAILED",
          "message": "Failed: endpoint service missed 3 check-ins"
        }
      ]
    }
  ],

I am going to confirm that /opt/Elastic/Endpoint/ exists and see what contents are inside.

Hello again @ferullo , here are the contents of /opt/Elastic/Endpoint/


I also have a diagnostic bundle with the following items inside

Thanks for those screenshots. It looks like the files in /opt/Elastic/Endpoint, which are created when Endpoint is installed, exist but those that would be created when Endpoint runs for the first time do not exist. So I think we need to figure out why Endpoint is not running after being installed.

The same Endpoint executable is used to install Endpoint as to run it after installation, so since Endpoint is successfully installing those files we can tell that the Endpoint executable is able to run on this host. There is just something preventing it from running after it's been installed.

  1. What type of system is this? What's the Linux distribution and version? Is SELinux enabled? Are there any other system restrictions or any other security software that might prevent Endpoint from running?

  2. Is the Endpoint service running? (Endpoint supports systemd and upstart. The systemd status command is sudo systemctl status ElasticEndpoint). You can also look for a process with the command line /opt/Elastic/Endpoint/elastic-endpoint run)

  3. If the service isn't running, can you start it? (sudo systemctl start ElasticEndpoint)

  4. If you can't start the service, what happens if you run Endpoint manually (sudo /opt/Elastic/Endpoint/elastic-endpoint run --log stdout --log-level DEBUG)

    • That will run Endpoint in the foreground. With that command you'll see a few log messages before Endpoint switches from using the command line log configuration to using the logging configuration in /opt/Elastic/Endpoint/elastic-endpoint.yaml . So it's to be expected that logging stops after a few messages if Endpoint keeps running.
  5. If that doesn't work, what happens if you copy /opt/Elastic/Endpoint/elastic-endpoint to /opt/Elastic/Agent/elastic-endpoint and run it from there? There might be a system policy preventing Endpoint from running from within it's own directory but not from Agent's. When Agent runs Endpoint in install mode it launches elastic-endpoint from within it's own directory (using an identical binary with the name endpoint-security)

  6. Are there any system syslog messages that highlight what is happening?

1 Like

Step 5 solved the issue for us.
Thank you @ferullo for saving the day!

Great, I'm glad to hear it. Just to make sure its clear to you and anyone else reading this, although you can run Endpoint out of /opt/Elastic/Agent/ you shouldn't long term. Updating the system policy so Endpoint can run from it's correct location is the long term solution.

It might seem an attractive option to just update Endpoint's upstart/systemd service to point to the /opt/Elastic/Agent/ location elastic-endpoint was copied to. However, Endpoint's lifecycle is fully managed by Agent; Agent can reinstall Endpoint at any time updating that service configuration.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.