Hello everyone, i'm having some trouble installing a fleet server but there's a twist to it:
I'm running Elastic 8.4 installed via Elastic Operator on Kubernetes and I'm not using a certificate authority, all is being done with self-signed certificates.
The fact is, that I ALREADY had a fleet server up and running for about a year, and had agents that were reporting back to Elasticsearch with no issues (agents reported Healthy status and stuff).
A couple of weeks ago, I noticed the Fleet server with the "offline" status and all the agents in the same state (I assume is because they need the fleet server to check-in properly). So did some troubleshooting, here's what I found:
When executing elastic-agent status
on the Fleet server node I get (this Fleet server was originally installed with the --insecure
flag):
Status: FAILED
Message: app fleet-server--8.4.1-3dbfdb00: Error - x509: certificate signed by unknown authority
Applications:
* endpoint-security (HEALTHY)
Protecting with policy {94b7223b-ad77-4f8f-855e-e9a4c7299c2c}
* filebeat (HEALTHY)
Running
* fleet-server (FAILED)
Error - x509: certificate signed by unknown authority
* filebeat_monitoring (HEALTHY)
Running
When checking the logs at /opt/Elastic/Agent/data/elastic-agent-8d7885/logs
I see:
{"log.level":"info","@timestamp":"2024-09-09T16:10:41.259Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2024-09-09T11:10:41-05:00 - message: Application: fleet-server--8.4.1[b36fd94a-ea15-4b1a-b4be-f8071cee7edf]: State changed to RESTARTING: Restarting - type: 'STATE' - sub_type: 'STARTING'","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-09-09T16:10:41.298Z","log.origin":{"file.name":"process/stdlogger.go","file.line":54},"message":"fleet-server stderr: \"{\\\"level\\\":\\\"info\\\",\\\"time\\\":\\\"2024-09-09T11:10:41-05:00\\\",\\\"message\\\":\\\"No applicable limit for 0 agents, using default.\\\"}\\n{\\\"level\\\":\\\"info\\\",\\\"time\\\":\\\"2024-09-09T11:10:41-05:00\\\",\\\"message\\\":\\\"No applicable limit for 0 agents, using default.\\\"}\\n\"","agent.console.name":"fleet-server","agent.console.type":"stderr","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-09T16:10:41.813Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2024-09-09T11:10:41-05:00 - message: Application: fleet-server--8.4.1[b36fd94a-ea15-4b1a-b4be-f8071cee7edf]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-09-09T16:10:42.313Z","log.origin":{"file.name":"status/reporter.go","file.line":260},"message":"Elastic Agent status changed to \"error\": \"app fleet-server--8.4.1-3dbfdb00: Error - x509: certificate signed by unknown authority\"","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-09-09T16:10:42.313Z","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2024-09-09T11:10:42-05:00 - message: Application: fleet-server--8.4.1[b36fd94a-ea15-4b1a-b4be-f8071cee7edf]: State changed to FAILED: Error - x509: certificate signed by unknown authority - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-09-09T16:10:52.315Z","log.origin":{"file.name":"status/reporter.go","file.line":260},"message":"Elastic Agent status changed to \"degraded\": \"component gateway-f65de4e8: checkin failed: fail to checkin to fleet-server: Post \\\"https://localhost.localdomain:8220/api/fleet/agents/b36fd94a-ea15-4b1a-b4be-f8071cee7edf/checkin?\\\": dial tcp 127.0.0.1:8220: connect: connection refused\"","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-09T16:10:52.316Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2024-09-09T11:10:52-05:00 - message: Application: fleet-server--8.4.1[b36fd94a-ea15-4b1a-b4be-f8071cee7edf]: State changed to RESTARTING: - type: 'STATE' - sub_type: 'STARTING'","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-09T16:10:52.317Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2024-09-09T11:10:52-05:00 - message: Application: fleet-server--8.4.1[b36fd94a-ea15-4b1a-b4be-f8071cee7edf]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-09T16:10:52.317Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2024-09-09T11:10:52-05:00 - message: Application: fleet-server--8.4.1[b36fd94a-ea15-4b1a-b4be-f8071cee7edf]: State changed to RESTARTING: Restarting - type: 'STATE' - sub_type: 'STARTING'","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-09-09T16:10:52.355Z","log.origin":{"file.name":"process/stdlogger.go","file.line":54},"message":"fleet-server stderr: \"{\\\"level\\\":\\\"info\\\",\\\"time\\\":\\\"2024-09-09T11:10:52-05:00\\\",\\\"message\\\":\\\"No applicable limit for 0 agents, using default.\\\"}\\n{\\\"level\\\":\\\"info\\\",\\\"time\\\":\\\"2024-09-09T11:10:52-05:00\\\",\\\"message\\\":\\\"No applicable limit for 0 agents, using default.\\\"}\\n\"","agent.console.name":"fleet-server","agent.console.type":"stderr","ecs.version":"1.6.0"}
So, I try and install another Fleet server in a different node that is available for it:
Management -> Fleet -> Settings
- Set a new host URL https://new.node.ip:8220
Everything runs in Almalinux 8 hosts
Then Management -> Fleet -> Agents -> Add Fleet Server
and follow instructions:
Fleet Server host: https://new.node.ip:8220
and generate the policy for the fleet server
Install via:
sudo ./elastic-agent install \
--fleet-server-es=https://elasticsearch.cluster.ip:9200 \
--fleet-server-service-token=TOKEN \
--fleet-server-policy=fleet-server-policy \
--fleet-server-es-ca-trusted-fingerprint=FINGERPRINT \
--insecure
You know, the usual way to install a Fleet Server that uses self-signed certificates (again, it was working fine before). However, this happens:
Elastic Agent will be installed at /opt/Elastic/Agent and will run as a service. Do you want to continue? [Y/n]:y
{"log.level":"info","@timestamp":"2024-09-09T10:58:05.121-0500","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":403},"message":"Generating self-signed certificate for Fleet Server","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-09T10:58:06.733-0500","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":792},"message":"Fleet Server - Starting","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-09T10:58:08.735-0500","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":792},"message":"Fleet Server - Error - x509: certificate signed by unknown authority","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-09T10:59:08.751-0500","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":792},"message":"Fleet Server - Starting","ecs.version":"1.6.0"}
Error: fleet-server failed: context canceled
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.4/fleet-troubleshooting.html
Error: enroll command failed with exit code: 1
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.4/fleet-troubleshooting.html
I have also tried installing with the --fleet-server-es-insecure
flag (found it in this issue ). With this, the fleet server installs properly and shows a Healthy status, but the agents wont report any data (although they show Healthy as well).
I would really appretiate some help here, and thank you in advance.