Updating cluster's node certificate failed to register

Dear all,

Today I'm once again dealing with :face_with_symbols_over_mouth: certificates!

Each time I'm touching those, there always an issue!

I started by replacing one certificate on an ingest node (ingest-prod01.domain.com). When I restart the elasticsearch service, the node is no longer joining the cluster and on the ES logs there a tons of very long warning messages.

Here is the one from the ingest node:

[2023-10-25T18:01:56,924][WARN ][o.e.x.c.s.t.n.SecurityNetty4Transport] [ingest-prod01.domain.com] client did not trust this server's certificate, closing connection Netty4TcpChannel{localAddress=/10.32.14.108:9300, remoteAddress=/10.32.14.106:48256, profile=default}
[2023-10-25T18:01:56,933][INFO ][o.e.c.c.JoinHelper       ] [ingest-prod01.domain.com] failed to join {elastic-master-prod03.domain.com}{sZLZdRYkRCy-5iQyXQUlRw}{_B1WgHkZR9GkFkZj8-3l1Q}{elastic-master-prod03.domain.com}{10.32.14.106}{10.32.14.106:9300}{m}{xpack.installed=true} with JoinRequest{sourceNode={ingest-prod01.domain.com}{tzX0ezvGTkyRFOJfa6KnWQ}{5hV3gce9TvahNpf5TgsJMQ}{ingest-prod01.domain.com}{10.32.14.108}{10.32.14.108:9300}{ir}{xpack.installed=true}, minimumTerm=77, optionalJoin=Optional[Join{term=77, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={ingest-prod01.domain.com}{tzX0ezvGTkyRFOJfa6KnWQ}{5hV3gce9TvahNpf5TgsJMQ}{ingest-prod01.domain.com}{10.32.14.108}{10.32.14.108:9300}{ir}{xpack.installed=true}, targetNode={elastic-master-prod03.domain.com}{sZLZdRYkRCy-5iQyXQUlRw}{_B1WgHkZR9GkFkZj8-3l1Q}{elastic-master-prod03.domain.com}{10.32.14.106}{10.32.14.106:9300}{m}{xpack.installed=true}}]}

The IP 10.32.14.108 is my ingest node that I've just update its certificate
The IP 10.32.14.106 is the current active master on my cluster

Going on the active master server, I got those messages:

[2023-10-25T18:31:12,695][WARN ][o.e.c.s.DiagnosticTrustManager] [elastic-master-prod03.domain.com] failed to establish trust with server at [10.32.14.108]; the server provided a certificate with subject name [CN=ingest-prod01.domain.com,OU=STI,O=Company,L=City,ST=State,C=CA], fingerprint [4eda725bd07a95889f125b26ba47c8f359c79e5a], keyUsage [digitalSignature, keyEncipherment, dataEncipherment] and extendedKeyUsage [clientAuth, serverAuth]; the certificate is valid between [2023-10-24T18:05:19Z] and [2025-10-23T18:05:19Z] (current time is [2023-10-25T22:31:12.695239702Z], certificate dates are valid); the session uses cipher suite [TLS_AES_256_GCM_SHA384] and protocol [TLSv1.3]; the certificate's subject alternative names cannot be parsed; the certificate is issued by [CN=PKIS01-CA,DC=domain,DC=ca] but the server did not provide a copy of the issuing certificate in the certificate chain; the issuing certificate with fingerprint [6007900a5e078378e4b2443b7a4d11c35d690e8f] is trusted in this ssl context ([xpack.security.transport.ssl (with trust configuration: PEM-trust{/etc/elasticsearch/certs/ROOT-CA-Base64.crt,/etc/elasticsearch/certs/SUB-CA-01-Base64.crt,/etc/elasticsearch/certs/SUB-CA-01-2022-Base64.crt})])

[2023-10-25T18:31:12,697][WARN ][o.e.t.TcpTransport       ] [elastic-master-prod03.domain.com] exception caught on transport layer [Netty4TcpChannel{localAddress=/10.32.14.106:52394, remoteAddress=10.32.14.108/10.32.14.108:9300, profile=default}], closing connection

Note: I changed some information from those message for security reasons.

Once again the ingest node tries to connect to the master node and it failed.

From those logs, we can see this:

  • "the certificate's subject alternative names cannot be parsed"
  • "the certificate is issued by [CN=PKIS01-CA,DC=domain,DC=ca] but the server did not provide a copy of the issuing certificate in the certificate chain"

When using openssl on the new certificate file, I can easily see the alternatives names:

X509v3 Subject Alternative Name: 
   DNS:ingest-prod01.domain.com, DNS:localhost, IP Address:127.0.0.1, IP Address:10.32.14.108

The old certificate, which hasn't expired yet, has the same values there. If I put back that certificate, the ingest node will starts and join the ELK cluster as expected.

I'm also mixed up; who is not trusting who in this? The ingest node ou the master node?

I'm sorry telling you that, but certificates will always be a nightmare for me!

Is someone has an idea what's may be wrong?

Regards,
Yanick

I'm also mixed up; who is not trusting who in this? The ingest node ou the master node?

It could be both ways (*), but based on the log messages you provided the master is trying to connect to the ingest node, and does not trust the ingest node's cert

[ingest-prod01.domain.com] client did not trust this server's certificate ... localAddress=/10.32.14.108:9300, remoteAddress=/10.32.14.106:48256
[elastic-master-prod03.domain.com] failed to establish trust with server at [10.32.14.108]

The master-node (10.32.14.106) is trying to open a connection to 10.32.14.108:9300 (the ingest node). That makes the master-node the "client" and the ingest-node the "server".
The master is refusing to connect (and logs an explanation of why). The master then tells the ingest node that it is closing the connection because it doesn't trust it, and the ingest node logs that (but can't give a details explain for it, because the master node doesn't provide any more details that "no, I don't trust you").

(*) Nodes open connections to each other, and they need to trust each other. If things are misconfigured it's often the case that neither trust the other

the certificate is issued by [CN=PKIS01-CA,DC=domain,DC=ca] but the server did not provide a copy of the issuing certificate in the certificate chain

This is normal and totally fine. The "but" (in but the server ...) implies that there could be a problem, but that's because the diagnostic message doesn't attempt to work out what we wrong, it just gives you all the useful information it can and leaves it up to the reader to understand which bit is important.

the certificate's subject alternative names cannot be parsed

It looks like this is the problem. I have literally never seen that message printed out in someone's logs before, and I wrote the message.
How did you generate this new certificate? There is some incompatibility between your certificate and the JDK on which Elasticsearch is running and I'd love to try and work out what the problem is.

1 Like

Hi @TimV,

First, thankl you for your reply that is very appreciated.

So here was the problem.

When I wrote this post, I said there was an error related with the "certificate's subject alternative names cannot be parsed". However, using openssl I was able to see those SANs from the certificate.

One of my colleage told me to test that certificate using Keystore Explorer which it uses JAVA.

When I paste the certificate into this tool then browse to the Alternate Names, I got an error:

image

So that was the error I got into the ES server when it was starting up.

Then I ask our security team to see if they make something wrong when signing the CSR I sent them.

After looking on their side, I resent the CSR and they resign it and this time it was working fine:

They are not sure why the first certificate they send my has some garbabe into it and that only JAVA can notice it.

This must be my karma working with certificates :rofl:

Anyhow, this has finally been solved!

Thank you and best regards!

Yanick

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.