Fleet agent installation failed: x509 error with --insecure

Hello everyone, i'm having some trouble installing a fleet server but there's a twist to it:

I'm running Elastic 8.4 installed via Elastic Operator on Kubernetes and I'm not using a certificate authority, all is being done with self-signed certificates.

The fact is, that I ALREADY had a fleet server up and running for about a year, and had agents that were reporting back to Elasticsearch with no issues (agents reported Healthy status and stuff).

A couple of weeks ago, I noticed the Fleet server with the "offline" status and all the agents in the same state (I assume is because they need the fleet server to check-in properly). So did some troubleshooting, here's what I found:

When executing elastic-agent status on the Fleet server node I get (this Fleet server was originally installed with the --insecure flag):

Status: FAILED
Message: app fleet-server--8.4.1-3dbfdb00: Error - x509: certificate signed by unknown authority
Applications:
  * endpoint-security    (HEALTHY)
                         Protecting with policy {94b7223b-ad77-4f8f-855e-e9a4c7299c2c}
  * filebeat             (HEALTHY)
                         Running
  * fleet-server         (FAILED)
                         Error - x509: certificate signed by unknown authority
  * filebeat_monitoring  (HEALTHY)
                         Running

When checking the logs at /opt/Elastic/Agent/data/elastic-agent-8d7885/logs I see:

{"log.level":"info","@timestamp":"2024-09-09T16:10:41.259Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2024-09-09T11:10:41-05:00 - message: Application: fleet-server--8.4.1[b36fd94a-ea15-4b1a-b4be-f8071cee7edf]: State changed to RESTARTING: Restarting - type: 'STATE' - sub_type: 'STARTING'","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-09-09T16:10:41.298Z","log.origin":{"file.name":"process/stdlogger.go","file.line":54},"message":"fleet-server stderr: \"{\\\"level\\\":\\\"info\\\",\\\"time\\\":\\\"2024-09-09T11:10:41-05:00\\\",\\\"message\\\":\\\"No applicable limit for 0 agents, using default.\\\"}\\n{\\\"level\\\":\\\"info\\\",\\\"time\\\":\\\"2024-09-09T11:10:41-05:00\\\",\\\"message\\\":\\\"No applicable limit for 0 agents, using default.\\\"}\\n\"","agent.console.name":"fleet-server","agent.console.type":"stderr","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-09T16:10:41.813Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2024-09-09T11:10:41-05:00 - message: Application: fleet-server--8.4.1[b36fd94a-ea15-4b1a-b4be-f8071cee7edf]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-09-09T16:10:42.313Z","log.origin":{"file.name":"status/reporter.go","file.line":260},"message":"Elastic Agent status changed to \"error\": \"app fleet-server--8.4.1-3dbfdb00: Error - x509: certificate signed by unknown authority\"","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-09-09T16:10:42.313Z","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2024-09-09T11:10:42-05:00 - message: Application: fleet-server--8.4.1[b36fd94a-ea15-4b1a-b4be-f8071cee7edf]: State changed to FAILED: Error - x509: certificate signed by unknown authority - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-09-09T16:10:52.315Z","log.origin":{"file.name":"status/reporter.go","file.line":260},"message":"Elastic Agent status changed to \"degraded\": \"component gateway-f65de4e8: checkin failed: fail to checkin to fleet-server: Post \\\"https://localhost.localdomain:8220/api/fleet/agents/b36fd94a-ea15-4b1a-b4be-f8071cee7edf/checkin?\\\": dial tcp 127.0.0.1:8220: connect: connection refused\"","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-09T16:10:52.316Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2024-09-09T11:10:52-05:00 - message: Application: fleet-server--8.4.1[b36fd94a-ea15-4b1a-b4be-f8071cee7edf]: State changed to RESTARTING:  - type: 'STATE' - sub_type: 'STARTING'","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-09T16:10:52.317Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2024-09-09T11:10:52-05:00 - message: Application: fleet-server--8.4.1[b36fd94a-ea15-4b1a-b4be-f8071cee7edf]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-09T16:10:52.317Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2024-09-09T11:10:52-05:00 - message: Application: fleet-server--8.4.1[b36fd94a-ea15-4b1a-b4be-f8071cee7edf]: State changed to RESTARTING: Restarting - type: 'STATE' - sub_type: 'STARTING'","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-09-09T16:10:52.355Z","log.origin":{"file.name":"process/stdlogger.go","file.line":54},"message":"fleet-server stderr: \"{\\\"level\\\":\\\"info\\\",\\\"time\\\":\\\"2024-09-09T11:10:52-05:00\\\",\\\"message\\\":\\\"No applicable limit for 0 agents, using default.\\\"}\\n{\\\"level\\\":\\\"info\\\",\\\"time\\\":\\\"2024-09-09T11:10:52-05:00\\\",\\\"message\\\":\\\"No applicable limit for 0 agents, using default.\\\"}\\n\"","agent.console.name":"fleet-server","agent.console.type":"stderr","ecs.version":"1.6.0"}

So, I try and install another Fleet server in a different node that is available for it:
Management -> Fleet -> Settings - Set a new host URL https://new.node.ip:8220

Everything runs in Almalinux 8 hosts

Then Management -> Fleet -> Agents -> Add Fleet Server and follow instructions:

Fleet Server host: https://new.node.ip:8220 and generate the policy for the fleet server

Install via:

sudo ./elastic-agent install \
  --fleet-server-es=https://elasticsearch.cluster.ip:9200 \
  --fleet-server-service-token=TOKEN \
  --fleet-server-policy=fleet-server-policy \
  --fleet-server-es-ca-trusted-fingerprint=FINGERPRINT \
  --insecure

You know, the usual way to install a Fleet Server that uses self-signed certificates (again, it was working fine before). However, this happens:

Elastic Agent will be installed at /opt/Elastic/Agent and will run as a service. Do you want to continue? [Y/n]:y
{"log.level":"info","@timestamp":"2024-09-09T10:58:05.121-0500","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":403},"message":"Generating self-signed certificate for Fleet Server","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-09T10:58:06.733-0500","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":792},"message":"Fleet Server - Starting","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-09T10:58:08.735-0500","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":792},"message":"Fleet Server - Error - x509: certificate signed by unknown authority","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-09-09T10:59:08.751-0500","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":792},"message":"Fleet Server - Starting","ecs.version":"1.6.0"}
Error: fleet-server failed: context canceled
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.4/fleet-troubleshooting.html
Error: enroll command failed with exit code: 1
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.4/fleet-troubleshooting.html

I have also tried installing with the --fleet-server-es-insecure flag (found it in this issue ). With this, the fleet server installs properly and shows a Healthy status, but the agents wont report any data (although they show Healthy as well).

I would really appretiate some help here, and thank you in advance.

Solved it! The http-public certificate was invalid. Updated the Fingerprint into the output setting and it propagated through the fleet properly.

1 Like

I'm having the same certificate error with self signed certs and dont fully understand your solution. How did you update the fingerprint in the install?

I am also facing the same issue and have no idea what was meant from @kev24 answer.

Hey! sure thing:

When deploying the ES cluster for the first time I had to extract the fingerprint of the http certificate by executing the following command in any of the members of the cluster:

openssl x509 -in /usr/share/config/http-certs/ca.crt -sha256 -fingerprint | grep SHA256 | sed 's/://g'

The result of this command is the trusted fingerprint that should be configured on the output section of the fleet settings.

After a year, Elasticsearch renewed the certificate, changing this fingerprint, so the agents with the old fingerprint would throw the x509 error.

The solution was basically updating the fingerprint in the fleet configuration, so it would be propagated automatically through the agents.

image

I hope this answers on more detail your questions

Indeed that does! I'll give it a try since I've been banging my head against why fleet wouldn't connect despite declaring the certificate authority for the HTTP and elastic stack in many different troubleshooting attempts.