Fleet Server 8.8.1 on prems boot issue

johnjohnsp1 · June 30, 2023, 12:47pm

Hi,
i have deployed on prems elasticsearch,kibana and fleet (version 8) on a Centos 8 distro, now every time i boot up the server i see the fleet service is up and running, but, once i go to the fleet sheet i see the agent itself on the localhost as offline. I then restart the service (systemctl restart elastic-agent.service) and the agent is back online. I made sure the port 9200,9300,8220,8221 and 5601 are all reachable.
Over the log i see this entry

[elastic_agent][info] Spawned new component fleet-server-default: Starting: spawned pid '1870'

[elastic_agent][info] Spawned new unit fleet-server-default-fleet-server-fleet_server-ed70d400-056e-11ee-bb2d-959895802987: Starting: spawned pid '1870'

[elastic_agent][info] Spawned new unit fleet-server-default: Starting: spawned pid '1870'

[elastic_agent][info] Component state changed fleet-server-default (STARTING->HEALTHY): Healthy: communicating with pid '1870'

[elastic_agent][error] Unit state changed fleet-server-default (STARTING->FAILED): Error - dial tcp 192.168.1.67:9200: connect: connection refused

[elastic_agent][error] Unit state changed fleet-server-default-fleet-server-fleet_server-ed70d400-056e-11ee-bb2d-959895802987 (STARTING->FAILED): Error - dial tcp 192.168.1.67:9200: connect: connection refused

any ideas ?
thanks

matled · June 30, 2023, 1:04pm

I do have the same issue since some versions.

Elastic-Stack 8.8.2

The Elastic Agents are managed with two Elastic-Fleet Servers. They connect to the one that is reacable. When i restart a Fleet Server they connect to the other. So far so good. Now if the Backend of the Elastic Agents with the fleet integration restarts (Elasticsearch), the Elastic Agents with the fleet role do not reconnect if the elasticsearch nodes are available again.

We need to manually restart the elastic agents with the fleet role (systemctl restart elastic-agent). Otherwise the agents are displayed as offline. In the following screenshot we have restarted manually one elastic agent with the fleet role and agents start to reconnect.

(ignore the version 8.8.1 its a bit older screenshot)

The fix might seem easy: Increase the timeout for the backend for the elastic agent with the fleet role or create a reconnect routine that tries to reconnect to the backend from time to time.

johnjohnsp1 · June 30, 2023, 1:28pm

thanks for the reply, yes is exactly my issue.
are you able to tell me where i can increase the timeout ?

matled · June 30, 2023, 2:08pm

I'm asking myself the same. I don't know where to increase the timeout. It could be hardcoded in the routine of the fleet integration that is applied to the elastic-agent. This would mean this should be tracked as an official optimization or bug.

Or we have an issue with our setup.

system · July 28, 2023, 2:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic Agent / Endpoint no longer can connect to fleet server Elastic Agent	2	896	November 14, 2022
Fleet agent doesn't work Elastic Agent fleet	14	991	September 13, 2022
Fleet server lose agents at restart Elastic Agent fleet	6	1812	March 15, 2023
Fleet Server is not Healthy Elastic Agent fleet	6	46	September 4, 2024
Fleet server agent unable to start- Connection refused SIEM fleet	4	4157	November 4, 2021

Fleet Server 8.8.1 on prems boot issue

Related Topics