When attempting to enable ssl on the transport, we are getting the following error in the log file (pointing to an untrusted certificate authority) and the nodes will
not communicate with one another.
[2016-03-22 10:48:17,145][WARN ][shield.transport.netty ] [node01] exception caught on transport layer [[id: 0x098be7f4, /192.168.2.103:52609 => /192.168.2.100:9300]],
closing connection javax.net.ssl.SSLException: Received fatal alert: certificate_unknown at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
I have added the 3 certificates that make up our trust chain into the keystore for shield, and I also created a truststore.jks and put the
3 certificates in that as well. Any ideas what we are missing?
Location of keystore:
/opt/elasticsearch/share/elasticsearch/config/shield/node01.jks
Location of truststore:
/opt/elasticsearch/share/elasticsearch/config/shield/truststore.jks
We have tried this with and without the resolve.name and still it doesn't work. We also validated our certicate has the IP address as a SAN in the certificate.
This definitely seems to be an issue validating the certificate. If you don't mind could you provide the output of keytool -list -keystore file.jks for both the keystore and truststore?
Here are the contents of the keystores (obviously some data has been modified to not harm any animals or careers...)
bash-4.1$ keytool -list -keystore node01.jks
Keystore type: JKS
Keystore provider: SUN
Your keystore contains 4 entries
policy, Mar 17, 2016, trustedCertEntry,
Certificate fingerprint (SHA1): 2C:E7:THESE ARE NOT THE CERTS YOU ARE LOOKING FOR:46:5B:DB
issuing, Mar 17, 2016, trustedCertEntry,
Certificate fingerprint (SHA1): 96:A0:84:THESE ARE NOT THE CERTS YOU ARE LOOKING FOR::19:68
root, Mar 17, 2016, trustedCertEntry,
Certificate fingerprint (SHA1): F5:59:64:THESE ARE NOT THE CERTS YOU ARE LOOKING FOR:76:8A:5D node01.mydomain.com, Mar 17, 2016, PrivateKeyEntry,
Certificate fingerprint (SHA1): 72:2B:A4:THESE ARE NOT THE CERTS YOU ARE LOOKING FOR::A3:E1
bash-4.1$ keytool -list -keystore truststore.jks
Keystore type: JKS
Keystore provider: SUN
Your keystore contains 3 entries
policy, Mar 17, 2016, trustedCertEntry,
Certificate fingerprint (SHA1): 2C:E7:THESE ARE NOT THE CERTS YOU ARE LOOKING FOR:46:5B:DB
issuing, Mar 17, 2016, trustedCertEntry,
Certificate fingerprint (SHA1): 96:A0:84:THESE ARE NOT THE CERTS YOU ARE LOOKING FOR::19:68
root, Mar 17, 2016, trustedCertEntry,
Certificate fingerprint (SHA1): F5:59:64:THESE ARE NOT THE CERTS YOU ARE LOOKING FOR:76:8A:5D
Here is a more interesting error in the elasticsearch log file... looks like the client may be presenting its simple hostname instead of a fully qualified hostname....
The subject alternate names in our certs do NOT contain just the hostname, only the fully qualified DNS name and IP address:
DNS Name=node01.com
IP Address=192.168.2.100
How can I get the nodes to present their FULLY qualified DNS names when initiation SLL connections on 9300?
ERROR:
[2016-03-23 09:20:07,231][WARN ][shield.transport.netty ] [node03] exception caught on transport layer [[id: 0xa303628c, /192.168.2.103:39301 :> node01/192.168.2.100:9300]], closing connection
javax.net.ssl.SSLHandshakeException: General SSLEngine problem
at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1431)
at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535)
at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:813)
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1218)
at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:304)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1509)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979)
at sun.security.ssl.Handshaker$1.run(Handshaker.java:919)
at sun.security.ssl.Handshaker$1.run(Handshaker.java:916)
at java.security.AccessController.doPrivileged(Native Method)
at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1369)
at org.jboss.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1392)
at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1255)
... 18 more
Caused by: java.security.cert.CertificateException: No subject alternative DNS name matching node01 found.
at sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:204)
at sun.security.util.HostnameChecker.match(HostnameChecker.java:95)
at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:455)
at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:436)
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:252)
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:136)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1496)
... 26 more
This indicates that node01 is what is being used to connect to the node and it is not available as a SAN entry or the CN. Can you share the network configuration of your nodes? Do you have a SAN for IP addresses and DNS entries in the cert?
Your network configuration for elasticsearch will come into play for that. That's why I asked for the network configuration from the elasticsearch.yml. I believe if you set network.host or network.publish_host elasticsearch will present the name defined there (you can specify the fully qualified name).
Do you use IP addresses in any aspect of configuration? Maybe for discovery (unicast ping hosts)? Is the short name (node01) registered in DNS or a local hosts file.
DNS has the fully qualified host name and I don't have access to modify the hosts file on the servers. I have spoken with our Certificate team and they are OK with me requesting new certs if that solves the problem.
I think using the FQDNs wherever you specify network configuration or addresses for the cluster to communicate to should solve the issue (if it is the same problem).
I'm glad to hear you have it solved. One followup for you:
Were you using the default ciphers and saw a log message about this? Are you using OpenJDK from a linux distribution?
If so, it most likely is that your JVM does not enable one of the authentication providers by default. Enabling unlimited ciphers will not resolve this.
Yes, we were using the default ciphers. Yes, we use OpenJDK. Here is the error we were seeing:
[2016-03-25 10:49:36,210][ERROR][shield.ssl ] [node01] unsupported ciphers [[TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA]] were requested but cannot be used in this JVM. If you are trying to use ciphers
with a key length greater than 128 bits on an Oracle JVM, you will need to install the unlimited strength
JCE policy files. Additionally, please ensure the PKCS11 provider is enabled for your JVM.
This fixes the unsupported Cipher above - but we do not want this long term. We have to update our JDK to support this cipher. Do you have a good article on OpenJDK to fix this?
If you are lucky it may be as simple as uncommenting the appropriate line ${java.home}/lib/security/java.security for the SunPKCS11 provider, which may look something like:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.