Hi there,
It's taken me a while to get to the bottom of this.
We're in an airgapped environment and upgrade agents from our local apt mirror of artifacts.elastic.co.
Recently, any agents that are upgraded via apt have been hanging with a status of (STARTING) Waiting for initial configuration and composable variables
.
This appears to be an issue with the packaged postinst
script not correctly identifying the existing agent installation directory, and therefore not copying state.enc
to the new directory.
I've narrowed it down to the following cases:
- Agent 8.12.x upgraded to any 8.13.x or above:
state.enc
is copied to the new installation directory, and the agent starts normally. - Agent 8.13.x upgraded to any 8.13.x or above:
state.enc
is not copied to the new directory, and the agent fails to join Fleet when restarted.
The main difference with 8.13.x is the inclusion of the version number in the install directory, rather than just the commit hash as in 8.12.x and below. While I'm not sure how relevant that is, I suspect the actual problem lies with the postinst
script being unable to follow the symlink for /usr/share/elastic-agent/bin/elastic-agent
, determining that there is no old install directory, and therefore not copying state.enc
(or state.yml
, if it exists).
At any rate, I'm quite convinced that the issue is with the postinst
script, and it's causing significant problems when it comes to upgrading existing agents. The only workaround I've found is to manually copy state.enc
to the new install directory and restart the agent, or to re-enroll the agent back into its policy. Neither of these solutions are viable to automate or maintain at scale.
Can someone take a look and confirm if they are seeing the same issue? I'm happy to open a GitHub issue if it's confirmed.
Cheers.