Enabling shield and ssl on node with multiple elasticsearch instances

Hi

We have a problem with setting SSL and shield on a cluster.

The cluster is done by 3 machine with two elastic istance each, it is a test environment that reproduce the deployment env in terms of elastic nodes.

The istances on the same node use the same IP address so I used fqdn to differentiate istances.

The problem si that the software (java ssl/tls library i was wondering) use the firts IP-fqdn resolution find in the host file of the server.

for example one server has two istances with the following fqdn
el-ro1.logga.local and el-wn1.logga.local

in the file host they are resolved with the same IP address.

So when es-ro1 try to contact es-wn1 on the same machine I see the following stack trace

I tried to use fqdn in discovery.zen.ping.unicast.hosts but nothing change..

Any hint?

Regards


[2016-04-07 16:31:32,785][ERROR][shield.transport.netty ] [el-ro1] SSL/TLS handshake failed, closing channel: General SSLEngine problem
[2016-04-07 16:31:32,789][WARN ][shield.transport.netty ] [el-ro1] exception caught on transport layer [[id: 0x1c433071, /10.13.195.187:35433 :> el-ro1.lo
gga.local/10.13.195.187:9300]], closing connection
javax.net.ssl.SSLHandshakeException: General SSLEngine problem
at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1431)
.........
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
...........
at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1255)
... 18 more
Caused by: java.security.cert.CertificateException: No subject alternative DNS name matching el-ro1.logga.local found.
at sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:204)
at sun.security.util.HostnameChecker.match(HostnameChecker.java:95)
.....
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:136)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1496)
... 26 more
[2016-04-07 16:31:34,265][ERROR][shield.transport.netty ] [el-ro1] SSL/TLS handshake failed, closing channel: General SSLEngine problem

I think you need to use the FQDN in all network settings. It sounds like you specify at least the bound address, do you also specify a publish address? If you set network.host, then change that to the FQDN in addition to using FQDN in the unicast hosts list.

This is my config for network parameters

network.host: 0.0.0.0

network.publish_host: el-ro1.logga.local

Shoud I set

network.host: el-ro1.logga.local

too?

I would try setting network.host to the FQDN on all the nodes and see if it helps. I should have asked this earlier, do you use the FQDN as a SAN in your certificates?

Unfortunately it does not solve the problem ...
I used the following command for the certreq

keytool -certreq -alias el-wn1 -keystore el-wn1.jks -file el-wn1.csr -keyalg rsa -ext san=dns:el-wn1.logga.local,ip:10.13.195.187

keytool -certreq -alias el-ro1 -keystore el-ro1.jks -file el-ro1.csr -keyalg rsa -ext san=dns:el-ro1.logga.local,ip:10.13.195.187

Should I use only IP adddress in this case ?

IP only may be the best choice. See the documentation on shield.ssl.hostname_verification.resolve_name here

1 Like

Thanks Jay

Setting

shield.ssl.hostname_verification.resolve_name: false

Solved the issue..

Thanks for your help

Giuseppe