Netty Transport Issue - ES 5.6.4, Java8, Data Node Only

Hi All,

Curly issue experienced after upgrading a cluster to 5.6.4. Ip addresses and hostnames have been sanitized.

2 out of 3 data nodes start OK and communicate back to three dedicated master nodes. If I disable SSL/TLS on the node that the error below occurs on, the node communicates happily with the master nodes (ZEN Unicast). However, when I enable TLS/SSL, the follow error occurs and the data node doesn't join the cluster.

The same CA/PKEY and Certificate is being used on all three nodes, 2 of which work fine, 1 which doesn't. IP Tables rules are consistent across all nodes in the cluster and all other servers are communicating fine. What I'm trying to ascertain is why this particular node is throwing the following error, its cryptic. It's as if elasticsearch isn't picking up the source IP of the system (guessing based on L: 0.0.0.0) reference, target seems OK though. I haven't found anything useful on the master node of interest, but may be over looking something.

[2017-11-22T18:55:57,602][INFO ][o.e.n.Node ] [emd01] starting ...
[2017-11-22T18:55:57,882][INFO ][o.e.t.TransportService ] [emd01] publish_address {172.17.14.155:9300}, bound_addresses {172.17.14.155:9300}
[2017-11-22T18:55:57,894][INFO ][o.e.b.BootstrapChecks ] [emd01] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-11-22T18:55:58,261][WARN ][o.e.x.s.t.n.SecurityNetty4Transport] [emd01] write and flush on the network layer failed (channel: [id: 0xb33386fb, L:0.0.0.0/0.0.0.0:35440 ! R:emn01.testdomain.com/172.17.14.161:9300])
java.nio.channels.ClosedChannelException: null
at io.netty.handler.ssl.SslHandler.channelInactive(...)(Unknown Source) ~[?:?]
[2017-11-22T18:55:58,261][WARN ][o.e.x.s.t.n.SecurityNetty4Transport] [emd01] write and flush on the network layer failed (channel: [id: 0x0c2c884f, L:0.0.0.0/0.0.0.0:38290 ! R:emn01.testdomain.com/172.17.14.162:9300])
java.nio.channels.ClosedChannelException: null
at io.netty.handler.ssl.SslHandler.channelInactive(...)(Unknown Source) ~[?:?]
[2017-11-22T18:55:58,261][WARN ][o.e.x.s.t.n.SecurityNetty4Transport] [emd01] write and flush on the network layer failed (channel: [id: 0x1779c76d, L:0.0.0.0/0.0.0.0:47642 ! R:emn01.testdomain.com/172.17.14.163:9300])
java.nio.channels.ClosedChannelException: null

How would I go about troubleshooting this further? It appears to be SSL/TLS related (enabling that functionality) but all nodes are identical and 2 work. Any assistance would be grand.

Please note, all data nodes in the cluster have been updated to the latest Centos 7 as at 22/11/2017 NZT, all running the same version of Elasticsearch and Java 8.

Thanks,
Andrew

This fault was resolved by fixing a TLS/SSL misconfiguration on the data node that was failing to start.

Specifically,

xpack.security.transport.ssl.enabled: true was causing the nodes not to start.

xpack.security.http.ssl.enabled: true when enabled was ok.

I’m still working through the fault with transport.ssl.enabled option as I haven’t gotten it sorted yet. It may have been because the master nodes didn’t have the transport ssl enabled option enabled prior to configuring it on the data and machine learning nodes. (Would anyone from the community be able to confirm that this would be the cause? May potentially save some time).

I moved this over to the #x-pack forum, as this is x-pack related.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.