Upgrade APM to Fleet/Elastic Agent- broken Java agents

I tried to upgrade from APM to Fleet / APM Agent following the guide here elastic.co/guide/en/apm/guide/7.17/upgrade-to-apm-integration.html#apm-integration-upgrade-steps-ess

But after switching to the Elastic Agent at:

"Elastic Cloud will now create a Fleet Server instance to contain the new APM integration, and then will shut down the old APM server instance. Within minutes your data should begin appearing in the APM app again."

All my java agents were failing to connect and started throwing:
ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Error sending data to APM server: Server returned HTTP response code: 502 for URL: https://rbx-logs.apm.us-east-1.aws.found.io/intake/v2/events, response code is 502

I noticed in the cloud deployment there isn't an APM endpoint anymore and it is a fleet endpoint, so I tried copying that and configuring my agents for it, but it is getting:
ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Error sending data to APM server: https://rbx-logs.fleet.us-east-1.aws.found.io/intake/v2/events, response code is 404

This is an elastic cloud deployment on 7.16.3 with java apm agents v1.28.4. The upgrade guide seemed pretty simple but I clearly missed some important config. In the APM Integration I thought it was odd that the host and url were localhost:8200 - should one of those be the external flee endpoint?

Many thanks

FYI for anyone else that comes across this. I believe the root issue was the host/url that was configured in the Elastic APM Integration. I didn't set these as the upgrade document didn't really say to touch anything and I don't even know if there was the chance to change them. But in the bad config it auto-configured both as localhost:

I spun up a new deployment/cluster and re-tried the upgrade there and this time it properly filled in the correct bind IP and external host:

I suspect this may be a bug somewhere in kibana? We had an older log cluster that I've upgraded a few times, but probably started on ES stack 7.5 maybe. I wonder if there was some setting it was expecting to exist but wasn't present in my cluster. I thought the below image was just a result of the APM server being decommissioned:

but in my new cluster it does have an APM endpoint to copy... so now I believe that was greyed out only as a result of it being localhost.

I've just ditched the old cluster, and might restore some of the logs/metrics/settings/users to the new one. So if anyone on the APM/Fleet side can comment if there was a better way to salvage it then it might help future inquirers. Also if there is something that can be added to the upgrade docs of a setting to verify before committing that may help, but I really don't remember seeing an option to set the host/url during the upgrade.

Sorry, it looks like you bumped into an issue that we recently discovered: https://github.com/elastic/kibana/issues/123570. This has been fixed in 7.17.0.

Sweet, thanks! Good to know what it was - the more I dug the more it seemed like it could be a bug.

1 Like

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.