Agent "Unhealthy". "Error while dialing open \\\.\\pipe\\elastic-agent-[...]"

I've upgraded ELK to 7.15 and had some errors starting up Kibana so I made the mistake to delete the .kibana index (someone wrote it somewhere). So at the end I had to basically set up everything new. Now I have also updated the Elastic Agent for the Fleet Server (which is sitting on the same VM as the Elastic/Kibana Server). I setup a policy on the fleet server and am copy/pasting the command in Fleet Server to enroll the agent and it runs successfully, though I have to include -insecure to bypass cert verification. The ELK server itself is reporting fine. A rule detection test worked. But now I wanted to upgrade my Agents connected to the fleet server. As the upgrade via the Kibana interface doesn't work I upgraded them manually. I took a Win-Server VM and tried to enroll it in the --insecure mode. It produced no errors but in Kibana Fleet Agent view it went "Online" for a few seconds and then switched back to yellow "Unhealthy". My rule detection test failed so I looked at the logs, but there were only logs without an event category.. I looked again at the Client and when I run elastic-agent status I get the below error...any ideas what it means and how to fix it?

Error: failed to communicate with Elastic Agent daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing open \\\\.\\pipe\\elastic-agent-e841deaceb2a76970405be8d93857858a506994c7e76b3e05c28d5cff44f1050: The system cannot find the file specified."

The logs I could find in the timeline are mostly Cannot index event publisher.Event[...], ending with this message:

{\"type\":\"mapper_parsing_exception\",\"reason\":\"failed to parse field [event.module] of type [constant_keyword] in document with id '8HcIRnwBsSwNACZSZs_t'. Preview of field's value: 'security'\",\"caused_by\":{\"type\":\"illegal_argument_exception\",\"reason\":\"[constant_keyword] field [event.module] only accepts values that are equal to the value defined in the mappings [system], but got [security]\"}}"
}

Any help would be appreciated!

Hello? Can anybody help?

file not found may mean agent is not running. can you check agent service is ok?
agent logs would be helpful as well

Hi, sorry for the late reply.

I checked and Elastic Agent is "running"
Elastic Endpoint is set to "Automatic" but was stopped. Tried to start it manually but stopped shortly after that.

Agent logs I could find:

Hope this helps.

Exec failure means agent tried to start a process but it failed for some reason,
we see endgame failed to upgrade existing installation let me pull somebody from endpoint team to take a look

1 Like

Hi Michal,
I have a similar problem.
I can install and enroll Elastic Agent 7.15.0 on two Windows Server 2016.
However on a third one I get a fail to enroll error.
I followed the troubleshooting guide and install the agent in standalone: elastic-agent.exe install -f

The service is up and running however when I check the status: elastic-agent.exe status

Error: failed to communicate with Elastic Agent daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing open \\.\pipe\elastic-agent-63c11620568add7064b6e682520fd6cdad079b720abb314622cac6a93a38626a: The system cannot find the file specified."

The dirty workaround was to stop the service and enroll it manually:
"C:\Program Files\Elastic\Agent\elastic-agent.exe" enroll --fleet-server-es=https://server:9200 --fleet-server-service-token=xxxxxx --fleet-server-policy=xxxxx

I wonder why enrollment fail time to time.

To diagnose why Endpoitn's upgrade failed you can check Elastic Agent's log output. It should include logs from Endpoint's attempt to upgrade. If you don't see any logs, you can try to manually upgrade Endpoint to see why it is failing.

To manually upgrade Endpoint, find endpoint-security-7.15.1-windows-x86_64.zip in Agent's download's directory and unzip it. Then run endpoint-security.exe install --resources endpoint-security-resources.zip --upgrade --log stdout --log-level debug

It seems you are right. Endpoint Security can't upgrade because it seems to be missing permissions. I tried to reinstall everything: same problem. I could confirm that any change to ProgramFiles/Elastic/Endpoint directory is not allowed, which means I can't even delete it.
What to do now?

Here are the latest logs:

PS C:\Program Files\Elastic\Agent> .\elastic-agent.exe status
Status: FAILED
Message: (no message)
Applications:
  * metricbeat  (HEALTHY)
    Running
  * endpoint-security   (FAILED)
    operation 'Exec' failed (return code: 231): 2021-11-02 17:06:46: info: Main.cpp:347 Upgrading existing installation (protected)
2021-11-02 17:06:46: info: InstallLib.cpp:405 Attempting to create a rollback package
2021-11-02 17:06:47: info: InstallLib.cpp:300 Running [c:\program files\elastic\agent\data\elastic-agent-5ae799\install\endpoint-security-7.15.1-windows-x86_64previouselastic-endpoint.exe] [uninstall --rollback --log stdout]
2021-11-02 17:08:49: info: InstallLib.cpp:319 Upgrade helper succeeded with output 2021-11-02 17:06:47: info: Main.cpp:253 Executing uninstall with rollback
2021-11-02 17:06:47: debug: File.cpp:471 Removing [C:\Program Files\Elastic\Endpoint\elastic-endpoint.yaml]
2021-11-02 17:06:47: info: File.cpp:501 Attempted deletion failed, failed to reset file attributes for C:\Program Files\Elastic\Endpoint\elastic-endpoint.yaml
2021-11-02 17:06:47: debug: Internal.cpp:463 Removal of C:\Program Files\Elastic\Endpoint\elastic-endpoint.yaml failed while creating rollback archive.
2021-11-02 17:06:47: debug: File.cpp:471 Removing [C:\Windows\System32\Drivers\elastic-endpoint-driver.sys]
2021-11-02 17:06:47: info: File.cpp:501 Attempted deletion failed, failed to reset file attributes for C:\Windows\System32\Drivers\elastic-endpoint-driver.sys
2021-11-02 17:06:47: debug: Internal.cpp:463 Removal of C:\Windows\System32\Drivers\elastic-endpoint-driver.sys failed while creating rollback archive.
2021-11-02 17:06:47: debug: File.cpp:471 Removing [C:\Windows\System32\Drivers\ElasticElam.sys]
2021-11-02 17:06:47: info: File.cpp:501 Attempted deletion failed, failed to reset file attributes for C:\Windows\System32\Drivers\ElasticElam.sys
2021-11-02 17:06:47: debug: Internal.cpp:463
...

Can you share the rest of the logs after the ...? If needed you can PM them to me.

Some key lines from the logs on Pastebin that you linked...

2021-10-20 22:55:14: info: Util.cpp:579 Starting stopped Endpoint to allow service command
2021-10-20 22:55:14: info: Util.cpp:597 Sending sevice command to facilitate uninstall
2021-10-20 22:55:35: error: Util.cpp:619 Failure sending DisablePPL message to allow our service to stop. Send status: Not found. Command Status: Success
2021-10-20 22:55:35: warning: Util.cpp:980 Error encountered while unprotecting service for uninstall

In Elastic Endpoint 7.13.x (it appears 7.13.2 is installed) we require interprocess communications to our running service process as part of the uninstallation procedure. The uninstallation step of the upgrade has found that the Elastic Endpoint service is not running, but after trying to start it and communicate with it, that appears to be failing.

We might be able to discover the cause of the failure by examining the log file (
c:\Program Files\Elastic\Endpoint\state\log\endpoint-000000.log). You won't be able to browse to that path in an Explorer window, but should be able to copy it from an elevated Administrator command prompt (copy "c:\Program Files\Elastic\Endpoint\state\log\endpoint-000000.log"). If you'd be able to PM that file, as well as the "c:\Program Files\Elastic\Endpoint\elastic-endpoint.yaml" file, hopefully we can identify the cause of the failure.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.