Elastic Agent with fleet policy refuses to start properly

My Elastic Agent that runs the Fleet policy stopped working a week or two ago. As it cannot start the Fleet server.

# elastic-agent status
Status: FAILED
Message: (no message)
Applications:
  * filebeat               (CONFIGURING)
                           Updating configuration
  * fleet-server           (FAILED)
                           Missed two check-ins
  * metricbeat             (HEALTHY)
                           Running
  * filebeat_monitoring    (CONFIGURING)
                           Updating configuration
  * metricbeat_monitoring  (HEALTHY)
                           Running

The logs in /opt/Elastic/Agent show a lot of connection refused errors.

2022-06-20T14:26:02.419-0700	WARN	status/reporter.go:236	Elastic Agent status changed to: 'degraded'
2022-06-20T14:26:02.419-0700	INFO	log/reporter.go:40	2022-06-20T14:26:02-07:00 - message: Application: fleet-server--7.17.3[eb389cc3-2383-46ba-996d-70409ed1f68f]: State changed to DEGRADED: Missed last check-in - type: 'STATE' - sub_type: 'RUNNING'
2022-06-20T14:26:02.989-0700	ERROR	fleet/fleet_gateway.go:205	Could not communicate with fleet-server Checking API will retry, error: fail to checkin to fleet-server: Post "http://localhost:8220/api/fleet/agents/eb389cc3-2383-46ba-996d-70409ed1f68f/checkin?": dial tcp 127.0.0.1:8220: connect: connection refused
2022-06-20T14:27:02.427-0700	ERROR	status/reporter.go:236	Elastic Agent status changed to: 'error'
2022-06-20T14:27:02.428-0700	ERROR	log/reporter.go:36	2022-06-20T14:27:02-07:00 - message: Application: fleet-server--7.17.3[eb389cc3-2383-46ba-996d-70409ed1f68f]: State changed to FAILED: Missed two check-ins - type: 'ERROR' - sub_type: 'FAILED'
2022-06-20T14:28:24.746-0700	ERROR	fleet/fleet_gateway.go:205	Could not communicate with fleet-server Checking API will retry, error: fail to checkin to fleet-server: Post "http://localhost:8220/api/fleet/agents/eb389cc3-2383-46ba-996d-70409ed1f68f/checkin?": dial tcp 127.0.0.1:8220: connect: connection refused
2022-06-20T14:33:14.745-0700	ERROR	fleet/fleet_gateway.go:205	Could not communicate with fleet-server Checking API will retry, error: fail to checkin to fleet-server: Post "http://localhost:8220/api/fleet/agents/eb389cc3-2383-46ba-996d-70409ed1f68f/checkin?": dial tcp 127.0.0.1:8220: connect: connection refused

I eventually dug my way into /opt/Elastic/Agent/data/elastic-agent-1993ee/logs/default and found the fleet-server-json.log.

{"log.level":"info","service.name":"fleet-server","version":"7.17.3","commit":"298a11f","pid":6853,"ppid":6769,"exe":"/opt/Elastic/Agent/data/elastic-agent-1993ee/install/fleet-server-7.17.3-linux-x86_64/fleet-server","args":["--agent-mode","-E","logging.level=info","-E","http.enabled=true","-E","http.host=unix:///opt/Elastic/Agent/data/tmp/default/fleet-server/fleet-server.sock","-E","logging.json=true","-E","logging.ecs=true","-E","logging.files.path=/opt/Elastic/Agent/data/elastic-agent-1993ee/logs/default","-E","logging.files.name=fleet-server-json.log","-E","logging.files.keepfiles=7","-E","logging.files.permission=0640","-E","logging.files.interval=1h","-E","path.data=/opt/Elastic/Agent/data/elastic-agent-1993ee/run/default/fleet-server--7.17.3"],"@timestamp":"2022-06-20T21:24:59.625Z","message":"Boot fleet-server"}
{"log.level":"info","service.name":"fleet-server","@timestamp":"2022-06-20T21:24:59.626Z","message":"starting communication connection back to Elastic Agent"}
{"log.level":"info","service.name":"fleet-server","@timestamp":"2022-06-20T21:24:59.626Z","message":"waiting for Elastic Agent to send initial configuration"}
{"log.level":"error","service.name":"fleet-server","error.message":"only 1 fleet-server input can be defined accessing config","@timestamp":"2022-06-20T21:25:00.148Z","message":"Exiting"}

That last line about 1 fleet-server input isn't about there being more than on instance of fleet running, right? I've only ever has the 1 instance, and I just reinstalled Elastic Agent and rebooted the server, so there shouldn't be another instance hanging around.

One thought does occur, when I was messing with the policy in Kibana, I got an error when I "upgraded" the fleet policy. It complained about the name of it already existing. I just stuck a b on the end of the same to rename it and it upgraded just fine.

Any help would be appreciated.

Thanks!

I eventually found a policy with another instance of the fleet integration in it. Deleting that seems to have worked, but in the process I've apparently caused fleet to move to http vs. https. Which means I get to go update all my elastic-agents to use the http...

Good thing this is all in dev...

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.