Elastic Agent with fleet policy refuses to start properly

jerrac · June 20, 2022, 9:42pm

My Elastic Agent that runs the Fleet policy stopped working a week or two ago. As it cannot start the Fleet server.

# elastic-agent status
Status: FAILED
Message: (no message)
Applications:
  * filebeat               (CONFIGURING)
                           Updating configuration
  * fleet-server           (FAILED)
                           Missed two check-ins
  * metricbeat             (HEALTHY)
                           Running
  * filebeat_monitoring    (CONFIGURING)
                           Updating configuration
  * metricbeat_monitoring  (HEALTHY)
                           Running

The logs in /opt/Elastic/Agent show a lot of connection refused errors.

2022-06-20T14:26:02.419-0700	WARN	status/reporter.go:236	Elastic Agent status changed to: 'degraded'
2022-06-20T14:26:02.419-0700	INFO	log/reporter.go:40	2022-06-20T14:26:02-07:00 - message: Application: fleet-server--7.17.3[eb389cc3-2383-46ba-996d-70409ed1f68f]: State changed to DEGRADED: Missed last check-in - type: 'STATE' - sub_type: 'RUNNING'
2022-06-20T14:26:02.989-0700	ERROR	fleet/fleet_gateway.go:205	Could not communicate with fleet-server Checking API will retry, error: fail to checkin to fleet-server: Post "http://localhost:8220/api/fleet/agents/eb389cc3-2383-46ba-996d-70409ed1f68f/checkin?": dial tcp 127.0.0.1:8220: connect: connection refused
2022-06-20T14:27:02.427-0700	ERROR	status/reporter.go:236	Elastic Agent status changed to: 'error'
2022-06-20T14:27:02.428-0700	ERROR	log/reporter.go:36	2022-06-20T14:27:02-07:00 - message: Application: fleet-server--7.17.3[eb389cc3-2383-46ba-996d-70409ed1f68f]: State changed to FAILED: Missed two check-ins - type: 'ERROR' - sub_type: 'FAILED'
2022-06-20T14:28:24.746-0700	ERROR	fleet/fleet_gateway.go:205	Could not communicate with fleet-server Checking API will retry, error: fail to checkin to fleet-server: Post "http://localhost:8220/api/fleet/agents/eb389cc3-2383-46ba-996d-70409ed1f68f/checkin?": dial tcp 127.0.0.1:8220: connect: connection refused
2022-06-20T14:33:14.745-0700	ERROR	fleet/fleet_gateway.go:205	Could not communicate with fleet-server Checking API will retry, error: fail to checkin to fleet-server: Post "http://localhost:8220/api/fleet/agents/eb389cc3-2383-46ba-996d-70409ed1f68f/checkin?": dial tcp 127.0.0.1:8220: connect: connection refused

I eventually dug my way into /opt/Elastic/Agent/data/elastic-agent-1993ee/logs/default and found the fleet-server-json.log.

{"log.level":"info","service.name":"fleet-server","version":"7.17.3","commit":"298a11f","pid":6853,"ppid":6769,"exe":"/opt/Elastic/Agent/data/elastic-agent-1993ee/install/fleet-server-7.17.3-linux-x86_64/fleet-server","args":["--agent-mode","-E","logging.level=info","-E","http.enabled=true","-E","http.host=unix:///opt/Elastic/Agent/data/tmp/default/fleet-server/fleet-server.sock","-E","logging.json=true","-E","logging.ecs=true","-E","logging.files.path=/opt/Elastic/Agent/data/elastic-agent-1993ee/logs/default","-E","logging.files.name=fleet-server-json.log","-E","logging.files.keepfiles=7","-E","logging.files.permission=0640","-E","logging.files.interval=1h","-E","path.data=/opt/Elastic/Agent/data/elastic-agent-1993ee/run/default/fleet-server--7.17.3"],"@timestamp":"2022-06-20T21:24:59.625Z","message":"Boot fleet-server"}
{"log.level":"info","service.name":"fleet-server","@timestamp":"2022-06-20T21:24:59.626Z","message":"starting communication connection back to Elastic Agent"}
{"log.level":"info","service.name":"fleet-server","@timestamp":"2022-06-20T21:24:59.626Z","message":"waiting for Elastic Agent to send initial configuration"}
{"log.level":"error","service.name":"fleet-server","error.message":"only 1 fleet-server input can be defined accessing config","@timestamp":"2022-06-20T21:25:00.148Z","message":"Exiting"}

That last line about 1 fleet-server input isn't about there being more than on instance of fleet running, right? I've only ever has the 1 instance, and I just reinstalled Elastic Agent and rebooted the server, so there shouldn't be another instance hanging around.

One thought does occur, when I was messing with the policy in Kibana, I got an error when I "upgraded" the fleet policy. It complained about the name of it already existing. I just stuck a b on the end of the same to rename it and it upgraded just fine.

Any help would be appreciated.

Thanks!

jerrac · June 21, 2022, 6:14pm

I eventually found a policy with another instance of the fleet integration in it. Deleting that seems to have worked, but in the process I've apparently caused fleet to move to http vs. https. Which means I get to go update all my elastic-agents to use the http...

Good thing this is all in dev...

system · July 19, 2022, 8:15pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fleet Server Agent not listening Elastic Agent fleet	5	2296	April 26, 2023
Fleet agent failed to connect to fleet server after assigning new agent policy Elasticsearch fleet	1	626	November 5, 2021
Fleet Server not starting with Elastic Agent (Error: fleet-server failed: context canceled) Beats fleet	2	945	October 3, 2022
Cannot checking in with fleet-server Elastic Agent elastic-stack-monitoring	0	19	February 17, 2025
Fail to checkin to fleet-server Elastic Agent fleet	17	6753	July 10, 2023

Elastic Agent with fleet policy refuses to start properly

Related topics