Elastic Agent 8.1.1 fails updates to 8.1.3

Elastic agent fails to update from 8.1.1 to 8.1.3.
On-prim elastic instance. Pulling from elastic directly for software downloads.

Start upgrade by selecting agents in Fleet.
Fleet policy consist of Endpoint only.

Click upgrade. Agent moves to blue state upgrading briefly. Watching the temp folders and the agent download folder shows no 8.1.3 files being pulled down. No firewall policy is blocking the test machines from downloading.

Wait 30 seconds and the agent moves into Unhealthily state and no changes. After 3 minutes they move back into healthy state still running 8.1.1.

Couple odd things to note. When the agent starts the attempted update process.

{"log.level":"warn","@timestamp":"2022-04-20T18:00:28.064Z","log.logger":"transport","log.origin":{"file.name":"transport/tcp.go","file.line":52},"message":"DNS lookup failure \"servername\": lookup servername: no such host","ecs.version":"1.6.0"}

This is clearly false as fqdn is used to connect to elastic and the fleet server. This happens after the start of the update only to be followed by:

{"log.level":"error","@timestamp":"2022-04-20T18:42:16.223Z","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2022-04-20T11:42:16-07:00 - message: Application: [797f85a7-8d00-4c64-9bb0-9f8b1a6583df]: State changed to FAILED: failed verification of agent binary: 2 errors occurred:\n\t* fetching asc file from 'C:\\Program Files\\Elastic\\Agent\\data\\elastic-agent-7f30bb\\downloads\\elastic-agent-8.1.3-windows-x86_64.zip.asc': open C:\\Program Files\\Elastic\\Agent\\data\\elastic-agent-7f30bb\\downloads\\elastic-agent-8.1.3-windows-x86_64.zip.asc: The system cannot find the file specified.\n\t* open C:\\Program Files\\Elastic\\Agent\\data\\elastic-agent-7f30bb\\downloads\\elastic-agent-8.1.3-windows-x86_64.zip.sha512: The system cannot find the file specified.\n\n - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"}

Filed download. From our firewall logs the machine didn't attempt to download anything just went to failed instantly. Did someone forget to add them to the repo's that the agents pull from because that would be funny vs a bug.

So far this is on Win 10 21h2 and Server 2019 both show the same events and are repeatable each time you run the upgrade options from Fleet.

It looks like you're running into this known issue: Fleet and Elastic Agent 8.1.1 | Fleet and Elastic Agent Guide [8.1] | Elastic

More then likely the same one. Just needed to point it out for resolution maybe in 8.2. I will not be updating 500+ agents by hand again...

So, 8.1.3 contains the fix I think: Fleet and Elastic Agent 8.1.3 | Fleet and Elastic Agent Guide [8.1] | Elastic, the issue is probably that the bug in 8.1.1 prevents the upgrade to 8.1.3 meaning that the fix can't be gotten for newer releases.

Based on the issue linked to the fix PR (really this issue: verifier fails in snapshot builds for downloaded artifacts · Issue #252 · elastic/elastic-agent · GitHub), I think you can move the agent to a blank policy, then the upgrade should work. (It seems like it fails when it needs to pull something not bundled with the elastic agent by default). I'd suggest trying that as a workaround.

Failed results for 8.1.3 with a blank policy. Not surprised I expected the same bugs existed from the patch notes and the github.

So far the 8.x branch has been a headache. I'm going to wait out a few more version and see if "maybe" it can be sorted out. If not I think I'll be done with the agent for awhile. It's taking more then 50% of my time to keep running. After the file lock issue causing in the surprising close to 6 figure revenue loss now the inability to upgrade its not stable for an enterprise environment. Reminds me of the days when I was forced into using what is now VMware Defense when it was the original company when it was still being developed.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.