Hello. I've been looking around to see if anyone else has experienced a similar issue but I haven't found anything. I've setup an Elasticsearch cluster and Kibana all using SSL certificates created following the Elastic Stack Basic Security Guide. I'm now attempting to install and enroll a Fleet Server on the same machine as my Kibana instance. After following the steps here, I end up with the following command to run:
sudo ./elastic-agent install -f \
--url=https://34.xxx.xxx.xxx:8220 \
--fleet-server-es=https://3.xxx.xxx.xxx:9200 \
--fleet-server-service-token=<NEWLY_GENERATED_SERVICE_TOKEN> \
--fleet-server-policy=<DEFAULT_POLICY_WITH_FLEET_SERVER_INTEGRATION> \
--fleet-server-es-ca=/etc/kibana/elasticsearch-ca.pem \
--certificate-authorities=/etc/kibana/fleet-server-certs/fleet-ca.crt \
--fleet-server-cert=/etc/kibana/fleet-server-certs/fleet-server.crt \
--fleet-server-cert-key=/etc/kibana/fleet-server-certs/fleet-server.key
I've tried numerous other versions of the same command including the quick start version which creates self-signed certificates. However, I always get exactly the same error code with no additional information on what's wrong.
YYYY-MM-DDTHH:MM:SS.sssZ INFO cmd/enroll_cmd.go:776 Fleet Server - Starting
Error: fleet-server failed: context canceled
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/7.17/fleet-troubleshooting.html
Error: enroll command failed with exit code: 1
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/7.17/fleet-troubleshooting.html
As I get no additional information I have no idea what's wrong. Every other post I've found with fleet server starting issues have a more specific error code or at the very least have additional entries before failure, I only have one: "Fleet Server - Starting". Note that I've already checked the Kibana and Elastic Agent versions. What issues could be causing the fleet server installation to error right at the start of the process? Are there logs I could examine to find the issue? Any help is greatly appreciated!
EDIT: I've recreated the the fleet server certs to see if that would fix the issue. I included -ip 34.xxx.xxx.xxx (public ip),10.xxx.xxx.xxx (private ip),0.0.0.0 in its cert with no -dns arg since I'm just using host names for this initial deployment. I also re-downloaded the agent and placed it in /opt/ along with all the necessary certs. My command looks pretty much the same. Note that I changed the ES IP to another ES host for an unrelated reason.
sudo ./elastic-agent install --url=https://34.xxx.xxx.xxx:8220 \
--fleet-server-es=https://54.xxx.xxx.xxx:9200 \
--fleet-server-service-token=AAEAAWVsYXN0aWMvZmxlZXQtc2VydmVyL3Rva2VuLTE2NDY0MTYxOTYxMTk6bG9OYlFFdWpULXlFX0h5ek81MmZzZw \
--fleet-server-policy=499b5aa7-d214-5b5d-838b-3cd76469844e \
--certificate-authorities=/opt/elastic-agent-7.17.0-linux-x86_64/ca.crt \
--fleet-server-es-ca=/opt/elastic-agent-7.17.0-linux-x86_64/elasticsearch-ca.pem \
--fleet-server-cert=/opt/elastic-agent-7.17.0-linux-x86_64/fleet-server.crt \
--fleet-server-cert-key=/opt/elastic-agent-7.17.0-linux-x86_64/fleet-server.key
I still get the same error with no additional information.
EDIT 2: I installed the agent first before running the command to see if that would help and it did get additional information.
$~ ./elastic-agent install -f
$~ ./elastic-agent enroll -f <previous args>
Response:
YYYY-MM-DDTHH:MM:SS.sssZ INFO cmd/enroll_cmd.go:571 Spawning Elastic Agent daemon as a subprocess to complete bootstrap process.
YYYY-MM-DDTHH:MM:SS.sssZ INFO application/application.go:67 Detecting execution mode
YYYY-MM-DDTHH:MM:SS.sssZ INFO application/application.go:88 Agent is in Fleet Server bootstrap mode
YYYY-MM-DDTHH:MM:SS.sssZ INFO [api] api/server.go:62 Starting stats endpoint
YYYY-MM-DDTHH:MM:SS.sssZ INFO application/fleet_server_bootstrap.go:130 Agent is starting
YYYY-MM-DDTHH:MM:SS.sssZ INFO [api] api/server.go:64 Metrics endpoint listening on: /opt/elastic-agent-7.17.0-linux-x86_64/data/tmp/elastic-agent.sock (configured: unix:///opt/elastic-agent-7.17.0-linux-x86_64/data/tmp/elastic-agent.sock)
YYYY-MM-DDTHH:MM:SS.sssZ INFO application/fleet_server_bootstrap.go:140 Agent is stopped
YYYY-MM-DDTHH:MM:SS.sssZ INFO stateresolver/stateresolver.go:48 New State ID is iLJi9-Kz
YYYY-MM-DDTHH:MM:SS.sssZ INFO stateresolver/stateresolver.go:49 Converging state requires execution of 1 step(s)
YYYY-MM-DDTHH:MM:SS.sssZ INFO log/reporter.go:40 YYYY-MM-DDTHH:MM:SS.sssZ - message: Application: fleet-server--7.17.0[]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
YYYY-MM-DDTHH:MM:SS.sssZ INFO stateresolver/stateresolver.go:66 Updating internal state
YYYY-MM-DDTHH:MM:SS.sssZ INFO cmd/enroll_cmd.go:776 Fleet Server - Starting
YYYY-MM-DDTHH:MM:SS.sssZ ERROR status/reporter.go:236 Elastic Agent status changed to: 'error'
YYYY-MM-DDTHH:MM:SS.sssZ ERROR log/reporter.go:36 YYYY-MM-DDTHH:MM:SS.sssZ - message: Application: fleet-server--7.17.0[]: State changed to FAILED: Error - dial tcp 54.xxx.xxx.xxx:9200: i/o timeout - type: 'ERROR' - sub_type: 'FAILED'
YYYY-MM-DDTHH:MM:SS.sssZ INFO status/reporter.go:236 Elastic Agent status changed to: 'online'
YYYY-MM-DDTHH:MM:SS.sssZ INFO log/reporter.go:40 YYYY-MM-DDTHH:MM:SS.sssZ - message: Application: fleet-server--7.17.0[]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
YYYY-MM-DDTHH:MM:SS.sssZ ERROR status/reporter.go:236 Elastic Agent status changed to: 'error'
YYYY-MM-DDTHH:MM:SS.sssZ ERROR log/reporter.go:36 YYYY-MM-DDTHH:MM:SS.sssZ - message: Application: fleet-server--7.17.0[]: State changed to FAILED: Error - dial tcp 54.xxx.xxx.xxx: i/o timeout - type: 'ERROR' - sub_type: 'FAILED'
It then continues attempting to restart but keeps getting this dial tcp error with the Elasticsearch node IP address.