NotSslRecordException: not an SSL/TLS record while adding New Node to Cluster

Hi,

Hope everyone is doing well!

We are having trouble adding a New Node to a Cluster which has SSL enabled using XPack Security. Before I write further, please keep the below things in mind:

  • Installed same version of ES on all Nodes, i.e, 7.1.1.
  • All the Nodes are within the same Network.
  • Same Certificate was generated on one Single node and files were copied to other Nodes.
  • We are already running 2 Nodes successfully without any issue with similar settings. Issue is coming while adding a 3rd Node.
  • We are running Elasticsearch inside Docker containers but kept Volume of all Configuration files mounted outside the container.

Node 1 Configuration:

OS: Debian 9
Docker Version: 19.03.8
IP: 10.240.0.31

node.name: "my-elk-node1"
cluster.name: "my-elk-cluster"
network.host: 0.0.0.0
node.master: true
node.data: true
node.ingest: true
bootstrap.memory_lock: true
discovery.seed_hosts: [ "127.0.0.1","10.240.0.26:9100","10.240.0.39:9200" ]
cluster.initial_master_nodes:
 - "my-elk-node1"
 - "my-elk-node2"
 - "my-elk-node3"

action.destructive_requires_name: true
indices.recovery.max_bytes_per_sec: 100mb
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: none
xpack.security.transport.ssl.keystore.path: /usr/share/elasticsearch/config/certificate/my-elk-certificates.p12
xpack.security.transport.ssl.truststore.path: /usr/share/elasticsearch/config/certificate/my-elk-certificates.p12

Node 2 Configuration:

OS: Debian 9
Docker Version: 19.03.1
IP: 10.240.0.26

node.name: "my-elk-node2"
cluster.name: "my-elk-cluster"
network.host: 0.0.0.0
http.port: 9100
node.master: true
node.data: true
node.ingest: true
bootstrap.memory_lock: true
discovery.seed_hosts: [ "127.0.0.1","10.240.0.31","10.240.0.39:9200" ]
cluster.initial_master_nodes:
 - "my-elk-node1"
 - "my-elk-node2"
 - "my-elk-node3"

action.destructive_requires_name: true
indices.recovery.max_bytes_per_sec: 100mb
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: none
xpack.security.transport.ssl.keystore.path: /usr/share/elasticsearch/config/certificate/my-elk-certificates.p12
xpack.security.transport.ssl.truststore.path: /usr/share/elasticsearch/config/certificate/my-elk-certificates.p12

Node 3 Configuration (having issues with)

OS: Debian 10
Docker Version: 20.10.9
IP: 10.240.0.39

node.name: "my-elk-node3"
cluster.name: "my-elk-cluster"
network.host: 0.0.0.0
node.master: true
node.data: true
node.ingest: true
bootstrap.memory_lock: true
discovery.seed_hosts: [ "127.0.0.1","10.240.0.26:9100","10.240.0.31:9200" ]
cluster.initial_master_nodes:
 - "my-elk-node1"
 - "my-elk-node2"
 - "my-elk-node3"

action.destructive_requires_name: true
indices.recovery.max_bytes_per_sec: 100mb
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: none
xpack.security.transport.ssl.keystore.path: /usr/share/elasticsearch/config/certificate/my-elk-certificates.p12
xpack.security.transport.ssl.truststore.path: /usr/share/elasticsearch/config/certificate/my-elk-certificates.p12

Our assumption was to simply add a Node with this simple configuration and divide our ELK into 3 nodes. But, unfortunately it is not working.

Stack Trace from ES Logs

{"type": "server", "timestamp": "2021-11-16T13:09:05,190+0000", "level": "WARN", "component": "o.e.t.TcpTransport", "cluster.name": "rg-elk-cluster", "node.name": "rg-elk-node3",  "message": "exception caught on transport layer [Netty4TcpChannel{localAddress=0.0.0.0/0.0.0.0:37430, remoteAddress=null}], closing connection" , 
"stacktrace": ["io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 485454502f312e30203430302042616420526571756573740d0a636f6e74656e742d747970653a206170706c69636174696f6e2f6a736f6e3b20636861727365743d5554462d380d0a636f6e74656e742d6c656e6774683a203234310d0a0d0a7b226572726f72223a7b22726f6f745f6361757365223a5b7b2274797065223a22696c6c6567616c5f617267756d656e745f657863657074696f6e222c22726561736f6e223a22696e76616c69642076657273696f6e20666f726d61743a20c2b85c75303030305c75303030305c7530303138c38028c38024c380227d5d2c2274797065223a22696c6c6567616c5f617267756d656e745f657863657074696f6e222c22726561736f6e223a22696e76616c69642076657273696f6e20666f726d61743a20c2b85c75303030305c75303030305c7530303138c38028c38024c380227d2c22737461747573223a3430307d",
"at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:472) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]",
"at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]",
"at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]",
"at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final]",
"at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final]",
"at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final]",
"at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final]",
"at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final]",
"at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final]",
"at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]",
"at java.lang.Thread.run(Thread.java:835) [?:?]",
"Caused by: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 485454502f312e30203430302042616420526571756573740d0a636f6e74656e742d747970653a206170706c69636174696f6e2f6a736f6e3b20636861727365743d5554462d380d0a636f6e74656e742d6c656e6774683a203234310d0a0d0a7b226572726f72223a7b22726f6f745f6361757365223a5b7b2274797065223a22696c6c6567616c5f617267756d656e745f657863657074696f6e222c22726561736f6e223a22696e76616c69642076657273696f6e20666f726d61743a20c2b85c75303030305c75303030305c7530303138c38028c38024c380227d5d2c2274797065223a22696c6c6567616c5f617267756d656e745f657863657074696f6e222c22726561736f6e223a22696e76616c69642076657273696f6e20666f726d61743a20c2b85c75303030305c75303030305c7530303138c38028c38024c380227d2c22737461747573223a3430307d",
"at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1182) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]",
"at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]",
"at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]",
"at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]",
"... 15 more"] }

Our Findings

As you can see, the major difference is of the OS version as well as the Docker Engine version in Node 3. So, we created another Machine to Test this and found that it was the issue. The Node was successfully able to connect on a machine with Debian 9 and Docker 19.

Unfortunately, we cannot Downgrade our OS as well as Docker Engine at this moment. Thus, we are trying to find the root cause behind this. Kindly help.

References

You have some misaligned configuration:

discovery.seed_hosts: [ "127.0.0.1","10.240.0.26:9100","10.240.0.39:9200" ]

You are using 10.240.0.26:9100 as a seed, but that is the HTTP port

IP: 10.240.0.26

...
http.port: 9100

Elasticsearch nodes do not use the http port to communicate with one another, they use the transport.port

Hi Tim,

It seems like my post was insufficient for you to debug the issue. This is because I've already mentioned in the initial points that we are able to run the Cluster successfully with Node 1 and Node 2 with the same settings. Thus, the issue seems to be something else. Otherwise, it wouldn't have work in the first place for Node 1 and Node 2.

Kindly go through the information shared again and see if you can catch any hint.

FYI, currently, we have downgraded to Docker Version 19 but on Debian 10, and it is working fine for Node 3 now. So, the issue seems to be something else.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.