Elastic-agent >= 8.13.x - Apt upgrade not copying state.enc to new installation dir

Hi there,

It's taken me a while to get to the bottom of this.

We're in an airgapped environment and upgrade agents from our local apt mirror of artifacts.elastic.co.

Recently, any agents that are upgraded via apt have been hanging with a status of (STARTING) Waiting for initial configuration and composable variables.

This appears to be an issue with the packaged postinst script not correctly identifying the existing agent installation directory, and therefore not copying state.enc to the new directory.

I've narrowed it down to the following cases:

  • Agent 8.12.x upgraded to any 8.13.x or above: state.enc is copied to the new installation directory, and the agent starts normally.
  • Agent 8.13.x upgraded to any 8.13.x or above: state.enc is not copied to the new directory, and the agent fails to join Fleet when restarted.

The main difference with 8.13.x is the inclusion of the version number in the install directory, rather than just the commit hash as in 8.12.x and below. While I'm not sure how relevant that is, I suspect the actual problem lies with the postinst script being unable to follow the symlink for /usr/share/elastic-agent/bin/elastic-agent, determining that there is no old install directory, and therefore not copying state.enc (or state.yml, if it exists).

At any rate, I'm quite convinced that the issue is with the postinst script, and it's causing significant problems when it comes to upgrading existing agents. The only workaround I've found is to manually copy state.enc to the new install directory and restart the agent, or to re-enroll the agent back into its policy. Neither of these solutions are viable to automate or maintain at scale.

Can someone take a look and confirm if they are seeing the same issue? I'm happy to open a GitHub issue if it's confirmed.

Cheers.

@ceekay yes please open a GitHub issue.

I'm very suspicious the issue is with elastic-agent/dev-tools/packaging/templates/linux/postrm.sh.tmpl at ca5a07cb239d9770e50d7ec6be34a00514cdba9f · elastic/elastic-agent · GitHub, where the symlink is getting deleted on an upgrade.

Thanks @Lee_Hinman

This problem has stopped me in my tracks, so I got impatient and opened one already at Agent .deb install: state.enc not copied during Elastic Agent upgrade from 8.13 and above · Issue #5101 · elastic/elastic-agent · GitHub

Looks like it's been assigned just now.

Cheers