Error when installing fleet server

When attempting to install a fleet server for a cluster running on an on-prem installation of Elastic Enterprise, I get the following error:

user@fleet01:/opt/elastic-agent$ sudo elastic-agent enroll -f --fleet-server-es=https://d48f*****.elastic.bentonvillek12.org:9243 --fleet-server-service-token=AAEAAWV*****
The Elastic Agent is currently in BETA and should not be used in production

2021-05-27T20:36:35.688-0500    INFO    cmd/enroll_cmd.go:300   Generating self-signed certificate for Fleet Server
2021-05-27T20:36:38.486-0500    INFO    cmd/enroll_cmd.go:643   Fleet Server - Starting
2021-05-27T20:36:39.488-0500    INFO    cmd/enroll_cmd.go:643   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:36:45.502-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:36:51.520-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:36:57.529-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:37:03.539-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:37:09.553-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:37:15.561-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:37:21.576-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:37:27.587-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:37:33.612-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:37:39.626-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:37:45.637-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:37:51.654-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:37:57.667-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:38:03.677-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:38:09.689-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:38:15.700-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:38:21.722-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:38:27.733-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
2021-05-27T20:38:33.744-0500    INFO    cmd/enroll_cmd.go:648   Fleet Server - Waiting on default policy with Fleet Server integration
Error: fleet-server never started by elastic-agent daemon: context canceled
user@fleet01:/opt/elastic-agent$

I'm running elastic search 7.13 and the 7.13 version of the agent.
Agent logs are here

Any ideas on where to go next?

Hi @Jonathon_Penn It seems the problem is that fleet-server cannot find an elastic agent policy with a fleet-server integration inside. Could you share a screenshot of the policies you have listed in Fleet under policies? You have one specific for fleet-server with the fleet-server integration configure? Is this a fresh cluster or upgraded from previous versions?

This is a freshly created cluster. I have destroyed and re-created it several times in the course of troubleshooting.



Based on your description of the problem I also tried manually adding a policy with the fleet server integration, but this did not fix the issue. Does it need to be named something special to be detected?

After some more troubleshooting I discovered that the "Default Fleet Server policy" is only created on a cluster if that cluster that does not have an apm node. I was able to create a cluster without an apm node, then add an apm node after creation. This gave me the "Default Fleet Server policy" and allowed the agent to connect. Thanks for pointing me in the right direction.

1 Like

Glad you found a solution. The Default Fleet Server policy should also have been created without the apm node as soon as a user with superuser permissions logs in for the first time. I'll investigate this further on our end.

I'm with the same problem. As I solve the APM, I have a new installation but I can't get the fleet server running, same error.

I can confirm similar behavior using 7.13.
So I checked the policies tab and added following parameter to the fleet enroll:
--fleet-server-policy=policy-elastic-agent-on-cloud

Once this was done, I was able to proceed with fleet server setup.

@Jonathon_Penn I missed a very important detail on my end which is that you are running on ECE. First, the release which can run fleet-server in ECE is not out yet. So the only way is to run your own fleet-server.

When running your own fleet-server, I strongly recommend NOT to use the fleet-server policy that is automatically setup with the policy-elastic-agent-on-cloud. Instead, create a new policy and add fleet-server to it. This makes sure you get full control over it.

Thanks for pointing that out. It's exactly the place where I got stuck using fleet in ECE. Once a new policy with fleet-server integration was created the fleet server started correctly.