Sequence of creating keys, certificates and signings on multiple node env

All Im having some questions that I hope someone can direct me on

I have 5 nodes - named 1-5

On node 1 - I run certgen which allows me to add all the "instances" Im assuming this means all the other nodes in the cluster. Running certgen gives me a .crt and a .key

Now I understand these need to be "signed"

So on node 1 - I run certgen -csr which allows me to add all the "instances" Im assuming this means the other nodes. This gives me a .csr and a .key. Im assuming this is the "signed" key (by the .csr) and would replace the first key created by certgen.

so my directories look like this

nn775 /usr/share/elasticsearch/bin/config/x-pack # ls -ltR
.:
total 24
drwxr-xr-x 2 root root 4096 Mar 22 10:18 sn776
drwxr-xr-x 2 root root 4096 Mar 22 10:18 dn779
drwxr-xr-x 2 root root 4096 Mar 22 10:18 dn778
drwxr-xr-x 2 root root 4096 Mar 22 10:18 dn777
drwxr-xr-x 2 root root 4096 Mar 22 10:18 nn775
drwxr-xr-x 2 root root 4096 Mar 21 16:15 ca

./sn776:
total 12
-rw-r--r-- 1 root root  940 Mar 22 10:12 sn776.csr
-rw-r--r-- 1 root root 1675 Mar 22 10:12 sn776.key
-rw-r--r-- 1 root root 1289 Mar 21 16:13 sn776.crt

./dn779:
total 12
-rw-r--r-- 1 root root  940 Mar 22 10:12 dn779.csr
-rw-r--r-- 1 root root 1675 Mar 22 10:12 dn779.key
-rw-r--r-- 1 root root 1289 Mar 21 16:13 dn779.crt

....

Note the .crt were from yesterday when I ran certgen and not part of teh signing process.

in my elasticsearch.yml
xpack.ssl.key: /usr/share/elasticsearch/bin/config/x-pack/sn776/sn776.key
xpack.ssl.certificate: /usr/share/elasticsearch/bin/config/x-pack/sn776/sn776.crt
xpack.ssl.certificate_authorities: [ "/usr/share/elasticsearch/bin/config/x-pack/ca/ca.crt" ]
xpack.security.transport.ssl.enabled: true
xpack.security.http.ssl.enabled: true

I copy the directories to the hosts and on startup get "Invalid signature on ECDH server key exchange message"

Im thinking that error is saying I have something wrong in my key creation

Here i have started node 1 - then node 2 generates the errors on node 1's logfile

log is here
https://sites.google.com/site/developtroubleshooting/home/logs

I think I have a fair grasp on the process - just need to get it right - any suggestions would be greatly appreciated so I can get this to work

Yes, the "instances" in this context are cluster nodes.
Strictly speaking, the instance is the underlying server rather than the node, and it's theoretically possible to have multiple nodes on a single instance by running multiple elasticsearch processes on a single machine.

It you run certgen in its default setting, then it automatically signs the certificates. You don't need to do anything else.
It will sign them using an automatically generated CA key, and you will need to configure you cluster to trust that CA, but the TLS instructions assume that's what you want to do and provide the appropriate configuration.

This is where you've gone astray. A csr is a Certificate Signing Request. It is a standard format for asking someone else to sign your certificate for you, it is not a signature.

The -csr option to certgen is for when you want to generate new certificates, and then send them off to be signed. Typically that is by a public/commercial CA, or a CA internal to your company/organisation.

There are cases where you might want to use certificates that are signed by a trusted authority (CA) but it is not required, and the simplest way to setup SSL/TLS within a cluster is to avoid any of the csr process and just work with the .key and .crt files that certgen generates.

It appears that the problem is that you have mixed .key files and .crt files from 2 different executions of certgen.

Essentially, SSL/TLS works like this (note, this is overly simplified, but it will give you the basic idea)

  • sn776.key is a private key that you store on your server. Anyone who has that key file can run a server and use the key to prove that they are sn776.
  • sn776.crt is a certificate that is generated from that key. It is public identifying information. Having the certificate doesn't prove anything, and your server will hand it out to anyone who needs it.
  • When someone connects over to TLS to your sn776 server, the server will provide the certificate, as identifying information, and then use the secret information in key to prove that it is in fact the real owner of the certificate.

The X-Pack certgen tool generates an individual .key and .crt (or .csr if you ask for that instead) for each node every time it runs.
The .key and .crt are a pair and need to stay together, you can't mix-and-match between runs.

As you can see in your directory listing, the timestamps for your sn776.key and sn776.crt are quite different. This is because they were generated at different times and aren't related to each other.

If you have the original zip file from the first time you ran certgen, then just use the ca, .crt and .key files from that. Throw away the zip file that you got when you ran with the -csr option, you don't need it.

Thanks so much for the response!! Im so depressed here.... :slight_smile:

I made progress - i recreated a new cert bundle using a instances.yml file

ie
instances:
- name: "nn775"
ip:
- "192.168.1.180"
dns:
- "nn775"
- name: "sn776"
ip:
- "192.168.1.181"
dns:
- "sn776"
- name: "dn777"
ip:
- "192.168.1.182"
dns:
- "dn777"
- name: "dn778"
ip:
- "192.168.1.183"
dns:
- "dn778"
- name: "dn779"
ip:
- "192.168.1.184"
dns:
- "dn779"

/usr/share/elasticsearch/bin/x-pack/certgen -in /root/instances.yml

unzipped the file in /usr/share/elasticsearch/bin/config/x-pack
gives me
dn778 /usr/share/elasticsearch/bin/config/x-pack :frowning: # l
total 40K
drwxr-xr-x 2 root root 4.0K Mar 22 21:30 sn776
drwxr-xr-x 2 root root 4.0K Mar 22 21:30 nn775
drwxr-xr-x 2 root root 4.0K Mar 22 21:30 dn779
drwxr-xr-x 2 root root 4.0K Mar 22 21:30 dn778
drwxr-xr-x 2 root root 4.0K Mar 22 21:30 dn777
drwxr-xr-x 2 root root 4.0K Mar 22 21:30 ca

I copy the x-pack directory to the other 4 nodes and finally there are no complaints in the es log file... all the nodes see each other!
BUT...
When I do to kibana I get the not trusted https: error

Kibana.yml
    elasticsearch.url: "https://192.168.1.180:9200"
    elasticsearch.ssl.verify: false

    console.proxyConfig:
     - match:
        host: "*"
        port: "{9200..9202}"

       ssl:
        ca: "/usr/share/elasticsearch/bin/config/x-pack/ca/ca.crt"

    server.ssl.key: /usr/share/elasticsearch/bin/config/x-pack/nn775/nn775.key
    server.ssl.cert: /usr/share/elasticsearch/bin/config/x-pack/nn775/nn775.crt

If I run
https://192.168.1.180:9200
I get the non secure https

and if I run curl commands against
curl -XGET https://nn775:9200/_cluster/health?pretty
It complains about the certificate not being correct

url: (60) Peer certificate cannot be authenticated with known CA certificates

Just need a little more tweaking... Ive been at this 2 days...

Is generating all the certs in one bundle as Im doing and copying them the correct approach?

also fwiw... nn775 is the originating server where ES and kibana run from and where the cert bundle was created

Sorry, I had meant to specifically call out Kibana in my previous response.

This is one of those cases where you might want to use certificates that are signed by a trusted authority. You don't have to, but you might want to.

The certgen tool cannot generate a certificate that is automatically trusted in your web browser or by curl.

This gets into the other part of how certificates work - trust.

One of the features of SSL/TLS is trusting that, when you connect to a server that claims to be xxx.yy.z, that it really is that server and you haven't been sneakily directed off somewhere else.

Within an elasticsearch cluster, we can facilitate that trust network by using the ca.crt file. That CA has signed every one of the certificates in your zip file, and it hasn't signed anything else (because certgen just invented it) so trusting that CA causes your cluster to trust exactly the set of certificates you generated. (*)

(*) Assuming you keep the ca.key locked away. If you pass that around then anyone who gets it can use it to sign anything they want and use that to join your cluster.

We can do that because the cluster is a closed system and we control every part of it. So it doesn't care what the rest of the world is doing, it just has its own little trust network.

General purpose web agents (such as your desktop browser, and curl) are not part of that trust network. They don't trust your new ca.crt (and why should they - you generated by running a tool that anyone in the world could download and run).

To put it another way, I can go and replicate all the steps you took and generate my own ca.crt and nn775 certificate pair and run my own node that claims to be your nn775 instance. Your browser needs to be able to tell which one of us is legitimate, and nothing in the process so far has done that.

Your choices are:

  • Tell your browsers to be insecure. They just trust that any server is who the claim to be. Not a great idea, but it works OK for local testing. You can pass the --insecure option to curl for this.
  • Teach your browsers to trust your newly generated CA. For curl you can use the --cacert or --capath options. For you desktop browser you will need to load the CA cert into the trusted certificate authorities. That process varies by browser and operating system, and there are security implications to consider. Since I don't know your local environment I can't really offer much more advice there.
  • Get your certificates signed by a trusted CA. A commercial CA isn't going to sign a certificate for nn775 because that's not a full domain name, and there's nothing in the name to distinguish your nn775 server from my nn775 server. Your organisation might have an internal CA that it uses for this purpose. Most large companies do.

also my kibana is accessing data and management no problem, kibana logs look like there are interacting with elastic data correctly.

Still getting the insecure error in the web browser and curl commands using the https://

site cant be trusted... error

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.