I've upgraded ELK to 7.15 and had some errors starting up Kibana so I made the mistake to delete the .kibana index (someone wrote it somewhere). So at the end I had to basically set up everything new. Now I have also updated the Elastic Agent for the Fleet Server (which is sitting on the same VM as the Elastic/Kibana Server). I setup a policy on the fleet server and am copy/pasting the command in Fleet Server to enroll the agent and it runs successfully, though I have to include -insecure to bypass cert verification. The ELK server itself is reporting fine. A rule detection test worked. But now I wanted to upgrade my Agents connected to the fleet server. As the upgrade via the Kibana interface doesn't work I upgraded them manually. I took a Win-Server VM and tried to enroll it in the --insecure mode. It produced no errors but in Kibana Fleet Agent view it went "Online" for a few seconds and then switched back to yellow "Unhealthy". My rule detection test failed so I looked at the logs, but there were only logs without an event category.. I looked again at the Client and when I run elastic-agent status I get the below error...any ideas what it means and how to fix it?
Error: failed to communicate with Elastic Agent daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing open \\\\.\\pipe\\elastic-agent-e841deaceb2a76970405be8d93857858a506994c7e76b3e05c28d5cff44f1050: The system cannot find the file specified."
The logs I could find in the timeline are mostly Cannot index event publisher.Event[...], ending with this message:
{\"type\":\"mapper_parsing_exception\",\"reason\":\"failed to parse field [event.module] of type [constant_keyword] in document with id '8HcIRnwBsSwNACZSZs_t'. Preview of field's value: 'security'\",\"caused_by\":{\"type\":\"illegal_argument_exception\",\"reason\":\"[constant_keyword] field [event.module] only accepts values that are equal to the value defined in the mappings [system], but got [security]\"}}"
}
I checked and Elastic Agent is "running"
Elastic Endpoint is set to "Automatic" but was stopped. Tried to start it manually but stopped shortly after that.
Exec failure means agent tried to start a process but it failed for some reason,
we see endgame failed to upgrade existing installation let me pull somebody from endpoint team to take a look
Hi Michal,
I have a similar problem.
I can install and enroll Elastic Agent 7.15.0 on two Windows Server 2016.
However on a third one I get a fail to enroll error.
I followed the troubleshooting guide and install the agent in standalone: elastic-agent.exe install -f
The service is up and running however when I check the status: elastic-agent.exe status
Error: failed to communicate with Elastic Agent daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing open \\.\pipe\elastic-agent-63c11620568add7064b6e682520fd6cdad079b720abb314622cac6a93a38626a: The system cannot find the file specified."
The dirty workaround was to stop the service and enroll it manually:
"C:\Program Files\Elastic\Agent\elastic-agent.exe" enroll --fleet-server-es=https://server:9200 --fleet-server-service-token=xxxxxx --fleet-server-policy=xxxxx
To diagnose why Endpoitn's upgrade failed you can check Elastic Agent's log output. It should include logs from Endpoint's attempt to upgrade. If you don't see any logs, you can try to manually upgrade Endpoint to see why it is failing.
To manually upgrade Endpoint, find endpoint-security-7.15.1-windows-x86_64.zip in Agent's download's directory and unzip it. Then run endpoint-security.exe install --resources endpoint-security-resources.zip --upgrade --log stdout --log-level debug
It seems you are right. Endpoint Security can't upgrade because it seems to be missing permissions. I tried to reinstall everything: same problem. I could confirm that any change to ProgramFiles/Elastic/Endpoint directory is not allowed, which means I can't even delete it.
What to do now?
Some key lines from the logs on Pastebin that you linked...
2021-10-20 22:55:14: info: Util.cpp:579 Starting stopped Endpoint to allow service command
2021-10-20 22:55:14: info: Util.cpp:597 Sending sevice command to facilitate uninstall
2021-10-20 22:55:35: error: Util.cpp:619 Failure sending DisablePPL message to allow our service to stop. Send status: Not found. Command Status: Success
2021-10-20 22:55:35: warning: Util.cpp:980 Error encountered while unprotecting service for uninstall
In Elastic Endpoint 7.13.x (it appears 7.13.2 is installed) we require interprocess communications to our running service process as part of the uninstallation procedure. The uninstallation step of the upgrade has found that the Elastic Endpoint service is not running, but after trying to start it and communicate with it, that appears to be failing.
We might be able to discover the cause of the failure by examining the log file ( c:\Program Files\Elastic\Endpoint\state\log\endpoint-000000.log). You won't be able to browse to that path in an Explorer window, but should be able to copy it from an elevated Administrator command prompt (copy "c:\Program Files\Elastic\Endpoint\state\log\endpoint-000000.log"). If you'd be able to PM that file, as well as the "c:\Program Files\Elastic\Endpoint\elastic-endpoint.yaml" file, hopefully we can identify the cause of the failure.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.