Fleet Server 8.8.1 on prems boot issue

i have deployed on prems elasticsearch,kibana and fleet (version 8) on a Centos 8 distro, now every time i boot up the server i see the fleet service is up and running, but, once i go to the fleet sheet i see the agent itself on the localhost as offline. I then restart the service (systemctl restart elastic-agent.service) and the agent is back online. I made sure the port 9200,9300,8220,8221 and 5601 are all reachable.
Over the log i see this entry

[elastic_agent][info] Spawned new component fleet-server-default: Starting: spawned pid '1870'

[elastic_agent][info] Spawned new unit fleet-server-default-fleet-server-fleet_server-ed70d400-056e-11ee-bb2d-959895802987: Starting: spawned pid '1870'

[elastic_agent][info] Spawned new unit fleet-server-default: Starting: spawned pid '1870'

[elastic_agent][info] Component state changed fleet-server-default (STARTING->HEALTHY): Healthy: communicating with pid '1870'

[elastic_agent][error] Unit state changed fleet-server-default (STARTING->FAILED): Error - dial tcp connect: connection refused

[elastic_agent][error] Unit state changed fleet-server-default-fleet-server-fleet_server-ed70d400-056e-11ee-bb2d-959895802987 (STARTING->FAILED): Error - dial tcp connect: connection refused

any ideas ?

1 Like

I do have the same issue since some versions.

  • Elastic-Stack 8.8.2

The Elastic Agents are managed with two Elastic-Fleet Servers. They connect to the one that is reacable. When i restart a Fleet Server they connect to the other. So far so good. Now if the Backend of the Elastic Agents with the fleet integration restarts (Elasticsearch), the Elastic Agents with the fleet role do not reconnect if the elasticsearch nodes are available again.


We need to manually restart the elastic agents with the fleet role (systemctl restart elastic-agent). Otherwise the agents are displayed as offline. In the following screenshot we have restarted manually one elastic agent with the fleet role and agents start to reconnect.

(ignore the version 8.8.1 its a bit older screenshot)

The fix might seem easy: Increase the timeout for the backend for the elastic agent with the fleet role or create a reconnect routine that tries to reconnect to the backend from time to time.


thanks for the reply, yes is exactly my issue.
are you able to tell me where i can increase the timeout ?

I'm asking myself the same. I don't know where to increase the timeout. It could be hardcoded in the routine of the fleet integration that is applied to the elastic-agent. This would mean this should be tracked as an official optimization or bug.

Or we have an issue with our setup. :slight_smile:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.