Elastic agent upgrade error

There are similar topics in forum but already closed so opening a new one. After upgrading stack from 8.1.1 to 8.2.3 we are unable to upgrade agents using fleet. When I trigger upgrade function it changes state to upgrading

then, after some time, state changes to unhealthy

and then it changes state back to healthy but agent stays at the same version, is not updated.
When looking at the log files I see that it can't find file for upgrade but is not trying to download it from artifacts.elastic.co

[elastic_agent][error] 2022-06-27T15:47:35+03:00 - message: Application: [081c640d-fac2-4cd7-b629-51f09907b3c5]: State changed to FAILED: failed verification of agent binary: 2 errors occurred:
	* fetching asc file from '/opt/Elastic/Agent/data/elastic-agent-7f30bb/downloads/elastic-agent-8.2.3-linux-x86_64.tar.gz.asc': open /opt/Elastic/Agent/data/elastic-agent-7f30bb/downloads/elastic-agent-8.2.3-linux-x86_64.tar.gz.asc: no such file or directory
	* open /opt/Elastic/Agent/data/elastic-agent-7f30bb/downloads/elastic-agent-8.2.3-linux-x86_64.tar.gz.sha512: no such file or directory

 - type: 'ERROR' - sub_type: 'FAILED'

I also did uninstall and fresh install of the agent on one of the hosts and that way everything went without a problem and host is healthy. From that I can only conclude that problem is not in connection between fleet server and agent.

This might be an instance of a known issue. The issue may be addressed once you upgrade your agents to 8.2.0.

Does this work only with manual upgrade? We have upgraded our stack to 8.3 and there is now an option in the fleet to choose agent version to upgrade to and same error appears when upgrading to either 8.2 and 8.3

Yeah, from what I can tell a manual upgrade will be required unfortunately.

I tried workaround mention here - [Agent-Upgrade]: For Linux .tar deploy; Agent goes Unhealthy on upgrade with Endpoint Security · Issue #173 · elastic/elastic-agent · GitHub

Download elastic-agent-8.2.0-linux-x86_64.tar.gz.sha512 and elastic-agent-8.2.0-linux-x86_64.tar.gz.asc from the Elastic Agent download page
Place the files in /opt/Elastic/Agent/data/elastic-agent-xxxxxx/downloads
tested on Debian and RPM distributions
ensure the files are owned by root:root and have permissions of 640
Run the Fleet upgrade process

That way we successfully upgraded agents to 8.2, then I initiated upgrade to 8.3, it went through well too but after some time agent went offline. I went on and restarted agent on one of the hosts, after that it changed status to

Active: activating (auto-restart) (Result: exit-code) since Wed 2022-06-29 08:41:39 CDT; 1min 23s ago

from log files - elastic-agent.service: Failed with result 'exit-code'.

When I execute command elastic-agent status it shows

/usr/bin/elastic-agent: 2: exec: /opt/Elastic/Agent/elastic-agent: not found

then looking if such a directory exists I get

lrwxrwxrwx 1 root root 58 Jun 29 07:07 /opt/Elastic/Agent/elastic-agent -> /opt/Elastic/Agent/data/elastic-agent-1a0f39/elastic-agent

and lastly looking under /opt/Elastic/Agent/data/elastic-agent-1a0f39/ I see that there are just logs and vault directories

ls -l /opt/Elastic/Agent/data/elastic-agent-1a0f39/
total 8
drwx------ 3 root root 4096 Jun 29 08:00 logs
drwxr-x--- 2 root root 4096 Jun 29 07:17 vault

If I execute elastic-agent status in system where first installed version of an agent was 8.2 and then upgraded to 8.3 I get expected response:

Message: (no message)
  * endpoint-security      (HEALTHY)
                           Protecting with policy {afdd95df-beed-42a7-8233-1bff0ad7ccb7}
  * filebeat_monitoring    (HEALTHY)
  * metricbeat_monitoring  (HEALTHY)

and listing elastic-agent-1a0f39 directory there are also more directories then on failing system.

sudo ls -l /Library/Elastic/Agent/data/elastic-agent-1a0f39
total 99400
drwxr-xr-x  23 root  wheel       736 Jun 29 12:18 downloads
-rwxr-xr-x   1 root  wheel  50870704 Jun 29 12:18 elastic-agent
drwxr-xr-x   5 root  wheel       160 Jun 29 12:20 install
drwx------   3 root  wheel        96 Jun 29 12:18 logs
drwxr-xr-x   3 root  wheel        96 Jun 29 12:18 run
-rw-------   1 root  wheel     16449 Jun 29 16:31 state.enc

That is unfortunate. It is tough to say what went wrong - is there anything else useful in the logs?

Is it possible for you to follow these steps for the agents that failed to upgrade:

  1. unenroll agent
  2. uninstall agent (this seems to have partially happened already somehow)
  3. install newer version
  4. re-enroll agent

Referenced from the same GitHub issue you shared:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.