Update to 8.3.1 from 8.3.0 has broken Fleet - please help!

Upgraded to 8.3 a few days ago - everything working fine.
Updated to 8.3.1 this morning and now the elastic-agent service won't start on my single on-prem node - this is also the Fleet Server.

Error in the logs is:
Error: could not read overwrites: fail to read configuration /etc/elastic-agent/fleet.enc for the elastic-agent: fail to decode bytes: cipher: message authentication failed

I've not changed anything in the config files at all prior to the update.
I stopped all the elastic services on the host, (elasticsearch, kibana, logstash, metricbeat, filebeat, elastic-agent) ran the update and restarted the elastic services again.
Everything restarted except the 'elastic-agent' service, which fails with the error described above.

I'm having the same issue as well, and I think that I made it worse. I tried recreating the fleet.enc and checking permissions on it, and I think I broke it worse. Good thing I took a snapshot.

Update: Downgraded to 8.3.0 for the agent only with
apt install elastic-agent=8.3.0 and my fleet server is working again. @finbarr996

That may get you working until Elastic can fix this.

1 Like

Hi Nathan,
Thank you for the tip - that worked perfectly - everything is back up and running again!
I wasn't aware of the ability to use apt to downgrade to a previous version - so thank you for that!

Hopefully Elastic will fix this soon.

Cheers,
John.

@finbarr996 How did you run the upgrade? Could you provide the steps?
Trying to get some repro case here.
Just tested the upgrade via fleet (initiated from Kibana) from 8.3.0 to 8.3.1 and it worked fine.

To upgrade the initial agent, the fleet server, I did apt update and apt upgrade.

After that, the elastic-agent.service was not able to start, so I could not start my fleet server. I got the same error that John got in his initial message.

I upgraded my single on-prem Elastic node by stopping all the Elastic services, and then followed with apt udate, apt upgrade, so Elasticsearch, Kibana, Logstash, Metricbeat and elastic-agent were all upgraded from 8.3 to 8.3.1. I then rebooted the box and started the services.

Elastic-agent was the only service that wouldn't start - it seems to me that for whatever reason that elastic-agent was no longer able to decrypt the fleet.enc file and read the configuration items from there, so it failed to start.

I hope this helps.

Confirmed 8.3.1 fails for fleet updates. Host is CentOS 8 Stream and also happens on Oracle Linux 8.3.

I can't recall the error message as it was several days ago but it was about the payload size being outside the expected. This was an error awhile ago.

Having rolled back to 8.3 which works perfectly, I just upgraded to 8.3.2 and the elastic-agent service fails to start with exactly the same error as my original post. :frowning:

I'll be rolling back to 8.3 again I guess.

I experienced this error going from 8.3.1 to 8.3.2 (RHEL7, using the package repos for elastic-agent). Downgrading back to 8.3.1 fixed it.

I'm finding the same thing on Ubuntu 20.04 (8.3.2)