Some hosts not successfully downloading Elastic Agent 8.15.1 on fleet upgrade

I'm using Elastic Cloud 8.15.1.

I'm attempting to update a number of on-prem hosts to Elastic Agent 8.15.1 via Fleet and am receiving the following errors:

{"log.level":"info","@timestamp":"2024-09-17T19:12:04.387Z","log.origin":{"file.name":"http/downloader.go","file.line":340},"message":"download from https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.15.1-windows-x86_64.zip failed at 104.1MB/179.2MB (58.07% complete) @ 14.87MBps: read tcp 10.220.192.91:61994->34.120.127.130:443: wsarecv: An existing connection was forcibly closed by the remote host.","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-09-17T19:12:04.484Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":183},"message":"unable to download package: 2 errors occurred:\n\t* package 'C:\\Program Files\\Elastic\\Agent\\data\\elastic-agent-ab6e68\\downloads\\elastic-agent-8.15.1-windows-x86_64.zip' not found: open C:\\Program Files\\Elastic\\Agent\\data\\elastic-agent-ab6e68\\downloads\\elastic-agent-8.15.1-windows-x86_64.zip: The system cannot find the file specified.\n\t* copying fetched package failed: read tcp 10.220.192.91:61994->34.120.127.130:443: wsarecv: An existing connection was forcibly closed by the remote host.\n\n; retrying (will be retry 7) in 51.725238094s.","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}

This is occurring on all the Windows hosts in this data center, as well as a few Windows hosts in other data centers. It is not occurring on ANY Linux hosts, nor is it including on ALL Windows hosts, just the ones in this data center. I have the same issue when attempting to download the Agent from Invoke-WebRequest in Powershell. Based on this, I increased the download timeout, and it worked just fine from the command line. However, I attempted to set agent.download.timeout: "900s", which worked in from the command line in Powershell, however downloads from Elastic Agent are still failing.

Suggestions?

Hey @DougR,

It is not possible to configure this timeout from Fleet at the moment. There is an open issue about that [Fleet] The agent upgrade download timeout should be configurable. · Issue #4580 · elastic/elastic-agent · GitHub.

From a comment by Julia on this issue, something you can try is to configure it using the API, as described on this comment: [Fleet] Add agent policy API to add settings not yet supported by UI · Issue #158699 · elastic/kibana · GitHub

Thanks. I guess I wasn't clear in my initial question - sorry!

I've already followed Julia's instructions on how to override the timeout through the API and haven't had any success after I set it to 900s (which worked for me from the command line in Powershell; it actually ended up taking 640s to download, default timeout is 600s, so I wasn't missing it by much.

Setting the log level to DEBUG didn't give me any additional information regarding the download. I haven't had the opportunity to work with our network team to see what they may be able to see (I'm planning to do that today).

Is there anything else you may be aware of?

Thx.

I guess we would have to wait for [Fleet] The agent upgrade download timeout should be configurable. · Issue #4580 · elastic/elastic-agent · GitHub then :slightly_frowning_face: