Hello @stephenb, Just to give a better explanation on this.
We had a fleet server, let's say that the host was named fleet-server
, then we created a self-signed certificate for this fleet server which was valid for the following:
- ip address of the server
- the hostname
fleet-server
(it is on the hosts file of every VM)
- the hostname
fleet-server.company.domain
(it is also on the hosts file of every VM)
On the fleet ui settings our default fleet host was configured as https://fleet-server:8220
and we installed the agents using https://fleet-server:8220
as the URL for the fleet server and also with the parameter --insecure
, since we used a self-signed certificate.
Recently we decided to deploy Elastic Agents to replace another log collector and we decided that it would be better to use a company certificate, signed by a know CA, to avoid using the --insecure
parameter.
We have a chain certificate and key for *.company.domain
and would need to just replace this on the fleet server and we assumed that this would not impact the current agents, because they were installed using --insecure
to not validate the certificate.
Before replacing the certificate we changed the host of the fleet server on fleet we to use https://fleet-server.company.domain
, which was valid in the self-signed certificate and would also be valid with the company certificate.
This worked without any issue and we could check that the agents were communicating to the fleet server through https://fleet-server.company.domain
.
Since this domain would be valid with both certificates, we decided to change the local files in the fleet server, so we stopped the fleet server, changed the files certificate and key files, we used before a self-generated CA and in this case we would use a know CA, so there was no CA file to replace, but since our certificate was a chain certificate, we replaced the CA also with this file.
I'm not sure this would work anyways, because it wasn't a simple certificate change, we were moving from using a self-generated CA to a know CA.
But the main issue was that we changed the files and restarted the Fleet Server, it never came back online, in the log file we could see that it was still trying to connect to https://fleet-server:8220
, this configuration was not present anywhere else as we have changed it to https://fleet-server.company.domain:8220
while still using the self-signed certificate.
The error logs were pretty clear in this case saying that the current certificate was valid for *.company.domain
and not for fleet-server
.
It seems that this setting on the Fleet UI is not applied to the fleet server itself, I think that this can be replicate with the following steps:
- create a self-signed certificate valid for
host
and host.some.domain
- install fleet server using only the
host
for the URL and also use only https://host:8220
on the fleet UI for the default host of fleet server.
- change the default host of fleet server to
https://host.some.domain:8220
.
- change the certificate with one that is only valid to
host.some.domain
and check if the fleet will work or not.
In the end we bit the bullet and decided to re-enroll everything with the correct certificate as we had just a couple of agents running.
But I don't think this would be a valid thing to do if we had hundreds or thousands of Agents.