Error "Path does not chain with any of the trust anchors" when enabling TSL between nodes

Hi,

I have been trying to enable TLS encryption between my nodes. I have followed the instructions of the documentation and generated p12-format certificates for each node and configured my cluster like this (the certificates don't have a password:

cluster.name: eLABsticsearch
node.name: elastic01

node.data: True
node.master: True
node.ingest: True
node.ml: True
search.remote.connect: false

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

network.host: 172.28.128.11
http.port: 9200

discovery.zen.ping.unicast.hosts: ["172.28.128.11","172.28.128.12","172.28.128.13"]
discovery.zen.minimum_master_nodes: 2

xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: full
xpack.security.transport.ssl.keystore.path: /etc/elasticsearch/elastic01.p12
xpack.security.transport.ssl.truststore.path: /etc/elasticsearch/elastic01.p12

xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: /etc/elasticsearch/elastic01.p12
xpack.security.http.ssl.truststore.path: /etc/elasticsearch/elastic01.p12

When starting the cluster I get these errors in the logs:

Caused by: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: Path does not chain with any of the trust anchors
_ at sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:362) ~[?:?]_
_ at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:270) ~[?:?]_
_ at sun.security.validator.Validator.validate(Validator.java:260) ~[?:?]_
_ at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324) ~[?:?]_
_ at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:281) ~[?:?]_
_ at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:136) ~[?:?]_
_ at org.elasticsearch.xpack.core.ssl.SSLService$ReloadableTrustManager.checkServerTrusted(SSLService.java:606) ~[?:?]_
_ at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1601) ~[?:?]_
_ at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216) ~[?:?]_
_ at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1052) ~[?:?]_
_ at sun.security.ssl.Handshaker$1.run(Handshaker.java:992) ~[?:?]_
_ at sun.security.ssl.Handshaker$1.run(Handshaker.java:989) ~[?:?]_
_ at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_161]_
_ at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1467) ~[?:?]_
_ at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1364) ~[?:?]_
_ at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1272) ~[?:?]_
_ at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1127) ~[?:?]_
_ at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1162) ~[?:?]_
_ at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[?:?]_
_ at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[?:?]_
_ ... 15 more_
Caused by: java.security.cert.CertPathValidatorException: Path does not chain with any of the trust anchors
_ at sun.security.provider.certpath.PKIXCertPathValidator.validate(PKIXCertPathValidator.java:153) ~[?:?]_
_ at sun.security.provider.certpath.PKIXCertPathValidator.engineValidate(PKIXCertPathValidator.java:79) ~[?:?]_
_ at java.security.cert.CertPathValidator.validate(CertPathValidator.java:292) ~[?:1.8.0_161]_
_ at sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:357) ~[?:?]_
_ at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:270) ~[?:?]_
_ at sun.security.validator.Validator.validate(Validator.java:260) ~[?:?]_
_ at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324) ~[?:?]_
_ at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:281) ~[?:?]_
_ at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:136) ~[?:?]_
_ at org.elasticsearch.xpack.core.ssl.SSLService$ReloadableTrustManager.checkServerTrusted(SSLService.java:606) ~[?:?]_
_ at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1601) ~[?:?]_
_ at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216) ~[?:?]_
_ at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1052) ~[?:?]_
_ at sun.security.ssl.Handshaker$1.run(Handshaker.java:992) ~[?:?]_
_ at sun.security.ssl.Handshaker$1.run(Handshaker.java:989) ~[?:?]_
_ at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_161]_
_ at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1467) ~[?:?]_
_ at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1364) ~[?:?]_
_ at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1272) ~[?:?]_
_ at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1127) ~[?:?]_
_ at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1162) ~[?:?]_
_ at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[?:?]_
_ at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[?:?]_
_ ... 15 more_
[2018-05-08T22:43:28,020][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [elastic01] client did not trust this server's certificate, closing connection NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:9300, remoteAddress=/172.28.128.13:41190}

Thanks a mil in advance for any help!

Can you list the steps you took to do this?

Based on the error you're receiving, it looks like the nodes have different CAs, which is most likely to be caused by a mistake in the certificate generation process.

OK then I think I understand my mistake (which sounds obvious now).

  1. I went onto each node (actually an Ansible playbook did) and used bin/X-Pack/certutil to generate a PKCS12 certificate
  2. For this I created one instance.yml file (each per node) so I could run certutil in silent mode (because of Ansible)
  1. I updated my elasticsearch.yml file so it would point to each certificate

So I would assume the problem is coming from the fact that I should have put one single instance.yml file together, with information about all the nodes in it, and then run certutil from one of the nodes. Doing it separately on each node is probably what caused the different CAs... Am I on the right track?

Thanks anyhow for your prompt answer.

Yes, that is correct.
There's a few ways you can approach this, but the main options are:

  • generate everything at once using instances.yml (on a single machine)
  • explicity generate a CA, and then generate a certificate for each node using that CA (on a single machine)
  • explicity generate a CA, copy it to each server (with the key), and then generate a certificate on each node using that CA. (I'd discourage this though, because it means your CA key is copied to lots of machines and that opens up an unnecessary attack vector).
2 Likes

Awesome, thanks Tim!!!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.