Elastic-agent output connection refuse error while setting up mutual TLS for fleet

First let me start by explaining what i am trying to achieve. I have a cluster of 3 nodes (kibana is also installed on the same VM with one of the nodes) -each of 3 nodes have all the roles- with mutual TLS between cluster nodes and kibana. I want to add two new VMs to this setup. One acting as both fleet-server(If im not mistaken, to be exact acting as both elastic-agent and fleet-server) and the other one acting as elastic-agnet only, with mutual TLS active on all possible connections.

To give an overview of how is the current set-up of mutual TLS. First, using elasticsearch cert-util i created a self-singed certificat:

  • ca.crt
  • ca.key

Then i used it to generate all other certificates.

About steps i took to install fleet-server:

first i modified my output to point to my nodes like below. I know there is more to config in this section and im gonna get back to it later, i have tried multiple solutions that did not work.

then using automatically generate policy

and specifying flee-server url

and then adding server with generating token following section below:

i tried to install the tar file with the command below:

 --url=https://172.12.1.164:8220
 --fleet-server-port=8220
 --fleet-server-es=https://172.12.1.161:9200
--fleet-server-es-ca=/tmp/ca.crt
 --fleet-server-es-cert=/tmp/fleet-server/fleet-server.crt
 --fleet-server-es-cert-key=/tmp/fleet-server/fleet-server.key
 --fleet-server-service-token=AAEAAWVsYXN0aWMvZmxlZXQtc2VydmVyL3Rva2VuLTE3NTM2MTM1NjEyMzQ6bmJ6dHpRNlpUWmFCZXZ6T09sRmRkQQ
--fleet-server-policy=fleet-server-policy
--fleet-server-cert=/tmp/fleet-server/fleet-server.crt
--fleet-server-cert-key=/tmp/fleet-server/fleet-server.key
--fleet-server-client-auth=required
--certificate-authorities=/tmp/ca.crt
--elastic-agent-cert=/tmp/fleet-server/fleet-server.crt
--elastic-agent-cert-key=/tmp/fleet-server/fleet-server.key

Im going to explain why i chose this set of parameters for installation. Maybe i just misunderstood the whole thing:

--url=https://172.12.1.164:8220 : To specify the IP and port which fleet-server will be avialbale on.

--fleet-server-port=8220 : To specify the port which fleet-server will be available on. I don't know if it is necessary considering port is specified in --url=https but did it anyway just in case.

--fleet-server-es=https://172.12.1.161:9200: Here is the how flee-server will connect to the cluster for reading and storing its data. Although probably it would be better to list other nodes but i did not know it would accept a list or not so i just added one node.

--fleet-server-es-ca=/tmp/ca.crt: This the original self-signed ca i used for generating all other CAs. I added this so when elasticsearch present a cert to fleet-sever it could verify the cert with this ca. Did i understand this parameter right?

--fleet-server-es-cert=/tmp/fleet-server/fleet-server.crt: This is the certificate flee-server (by fleet-server i mean the fleet-server subprocess of elastic-agent) present to elasticsearch for mutual TLS.

--fleet-server-es-cert-key=/tmp/fleet-server/fleet-server.key: The key for the --fleet-server-es-cert.

--fleet-server-service-token=AAEAAWVsYXN0aWMv: This is the token which fleet-server will use for authenticating itself to the cluster.

--fleet-server-policy=fleet-server-policy: The policy which will be applied to elastic-agent which is the automatically generated policy and will cause the fleet-server subprocess to be run and make the elastic-agent act as a fleet-server.

--fleet-server-cert=/tmp/fleet-server/fleet-server.crt: This is the certificate fleet-server will present to clients like elastic-agent which will try to connect to fleet-server.

Im using the same cert for both connections: fleet-server to elasticsearch cluster and flee-server to elastic-agents.

--fleet-server-cert-key=/tmp/fleet-server/fleet-server.key: The key for the previous parameter.

--fleet-server-client-auth=required : This specify that connection to fleet-server must use mutual TLS so it is mandatory for elastic-agent trying to connect to fleet-server present a certificate.

--certificate-authorities=/tmp/ca.crt: This is the same self-signed certificate used to generate all other certs.

Here my understanding is a fleet server can be seen as two parts (elastic-agent + fleet subprocess) So each of these components use --certificate-authorities=/tmp/ca.crt to verify certificate presented to them. I am not sure if understood it right?

--elastic-agent-cert=/tmp/fleet-server/fleet-server.crt: This for elastic-agent part of fleet-sever when trying to communicate withe fleet-server. this is based on assumption of fleet-server being two parts (elastic-agent + fleet subprocess).

--elastic-agent-cert-key=/tmp/fleet-server/fleet-server.key: Key for the previous parameter.

It is worth mentioning that all certs are generated with no password

I guess the most important question here is did i select the right set of parameters and did i understand them right?

Following the steps above the fleet server will be connected and have a healthy state. But when i check the logs in /opt/Elastic/Agent/data/elastic-agent-*/logs/elastic-agent.ndjson. There are errors related to agent consecutively
trying to connect to 3 cluster nodes but failing. I believe the problem must be related to output

And i tried different methods and i faced different errors. Like when i use ca.crt fingerprint

i get errors of not including certificate chain which i tried solving it with this answer but did not work. Then i tried not using fingerprint and instead using the yaml

section like this which provide the path to the self signed ca orpassing it directly like this answer but still did no work. Am i doing something totally wrong? Can someone please provide some general guidance on how to set this?

By the way this is all done on elasticsearch version 8.17.

How did you create the certs for the 3 Elasticsearch HTTPS endpoints?

The ca.crt and ca.key are created using

elasticsearch-certutil ca --pem

for exmaple

./elasticsearch-certutil ca --days --pem --out 

and other certs are created using command

elasticsearch-certutil cert --pem

for example

./elasticsearch-certutil cert --ca-cert ca.crt --ca-key ca.key  --days --ip  --pem --out  --name 

are you putting in all the IPS etc? are you using and instance.yml to put in all the correct configurations?

Example

instances:
  - name: "node1"
    ip:
      - "192.0.2.1"
    dns:
      - "node1.mydomain.com"
  - name: "node2"
    ip:
      - "192.0.2.2"
      - "198.51.100.1"
  - name: "node3"
  - name: "node4"
    dns:
      - "node4.mydomain.com"
      - "node4.internal"
  - name: "CN=node5,OU=IT,DC=mydomain,DC=com"
    filename: "node5"

I will double check again but i believe there is no problem with certificates. The reason i say it is that i in elasticsearch i have for both http and transport layer on all nodes:

verification_mode: full
client_authentication: required

also when i use curl with cert generated for fleet-server from fleet-server VM, fleet-server.crt and fleet-server.key, i get proper response.

By the way i checked the certificates again and they were ok.

Not sure why you are requiring client auth on the HTTP layer... most the time for HTTP it is not saying you can't but if you do every client need to present their cert in full

verification_mode: certificate

It is not really about what make more sense or not, it is just what i have been told to do, otherwise you are totally right. But in case i was still had to this, am i on the right path? it would be nice of you if you could give me a very brief hint on how to do so? specially what parameters should i pass during fleet installation and how should i handle CA section of the output?

Mutual TLS is a perfectly fine approach and if it's a requirement that's all good.

It's more complex and you need to understand all the components involved.

To be transparent, I have not personally set up an on-prem with self-signed search with mutual TLS... It's going to take us some work

I would suggest you read the following docs closely

Particularly this one

I have tried reading all docs but it just some parts are confusing and unlike most of elasticsearch documents it is a bit not clear. But thanks anyway. If i could make it work i would post the whole steps i took under this post.Thanks again.

Yeah it's hard.....

And besides all the elastic part, a good understanding of the basics of mTLS is needed as well

My suggestion would be to get it to work first with just normal certificate validation... 1 way validation...

Get that working....

Then work on getting the mutual

The mutual depends that you have to have the client side certificates available on both the fleet and Elasticsearch side. That's how mutual TLS works.

Like I said I have not done it. I imagine few have.... But I get one way working and then work towards mutual even if I was doing it just for myself. That's how I would go about it.

Here is the example the link above points to it.

elastic-agent install --url=https://your-fleet-server.elastic.co:443 \
--certificate-authorities=/path/to/fleet-ca,/path/to/agent-ca \
--elastic-agent-cert=/path/to/agent-cert \
--elastic-agent-cert-key=/path/to/agent-cert-key \
--elastic-agent-cert-key=/path/to/agent-cert-key-passphrase \
--fleet-server-es=https://es.elastic.com:443 \
--fleet-server-es-ca=/path/to/es-ca \
--fleet-server-es-cert=/path/to/fleet-es-cert \
--fleet-server-es-cert-key=/path/to/fleet-es-cert-key \
--fleet-server-cert=/path/to/fleet-cert \
--fleet-server-cert-key=/path/to/fleet-cert-key \
--fleet-server-client-auth=required \
--fleet-server-service-token=FLEET-SERVER-SERVICE-TOKEN \
--fleet-server-policy=FLEET-SERVER-POLICY-ID \
--fleet-server-port=8220

BTW this is KEY and often missed step (and yes not well documented you need to put the client Certs in the Elasticsearch Output Setting in the Fleet Setting in the Advanced yaml section... So the client cert can be validated by Elasticsearch

This is in

Fleet -> Settiings -> Output -> The Elasticsearch output