Elastic agent goes offline & healthy every 5 minutes

I have an Elastic Stack setup managed by the ECK operator on Kubernetes. I deployed a Fleet Server on Kubernetes and an Elastic Agent on a separate Alpine Linux VM for log shipping. The installation was successful, and the Elastic Agent shows as healthy. I have also configured the output to Kafka, and the data is successfully reaching Kafka. However, it's strange that every 5 minutes, the agent goes offline in Kibana, but then it automatically returns to a healthy state. Even when the agent shows as offline in Kibana, it is still running on the VM. I found only one error in the logs, which is as follows: (I used elastic stack version of 8.15.1)

{
"log.level": "error",
"@timestamp": "2024-09-27T06:29:24.871Z",
"log.origin": {
"function": "github.com/elastic/elastic-agent/internal/pkg/agent/application/gateway/fleet.(FleetGateway).doExecute",
"file.name": "fleet/fleet_gateway.go",
"file.line": 195
},
"message": "Cannot checkin in with fleet-server, retrying",
"log": {
"source": "elastic-agent"
},
"error": {
"message": "fail to checkin to fleet-server: all hosts failed: 1 error occurred:\n\t
requester 0/1 to host <URL_FOR_FLEETSERVER> errored: Post "URL_FOR_FLEETSERVER/api/fleet/agents/<AGENT_ID>/checkin?": EOF\n\n"
},
"request_duration_ns": 30008744594,
"failed_checkins": 8,
"retry_after_ns": 702760145637,
"ecs.version": "1.6.0"
}

From Elasticsearch to Elastic Agent

The agents go offline if there wasn't a successful checkin in the last 5 minutes. If they come back online it seems to be a temporary issue. Does the agent host have access to fleet-server?

Yes the agent installed on a VM has access to fleet server, so far I change the parameter checkin_long_poll to 30s which is 5m by default & now it is working but when I try to increase it from 30s again agent gets into offline-heathy loop there is a load-balancer in between I have also configured the timeouts for 5min but still it didn't work

As the elastic agent makes a long polling request to fleet server for the configuration check so it opens a connection to server for 5 min, as in my case configuring the timeouts at loadbalancer to required time solves the issue.