io.netty.handler.codec.DecoderException: java.io.StreamCorruptedException

We are encountering the below exception in our environment, where we are running Elasticsearch version 6.8.12 with two Elasticsearch nodes communicating over stunnel.

Please let me know the probable reason for below exception or provide pointers on what needs to be checked to resolve the issue.

Few points to note.

  1. Elastic search nodes on both Node A and Node B are same.

  2. Both the nodes A and B have the same Java version.

  3. Cluster names are same in both the Node A and Node B’s Elasticsearch.yml

  4. discovery.zen.ping.unicast.hosts: ["nodeA.hostname:9300","NodeB.hostname:9300"] is one of the entry in elasticsearch.yml

  5. telnet nodeA.hostname 9300 and telnet nodeB.hostname 9300 commands work fine.

  6. telnet nodeA.IP 9300 and telnet nodeB.IP 9300 commands work fine.

  7. nslookup on Node A’s FQDN resolves to right IP address of Node A

  8. nslookup on Node A’s IP address returns the right Node A’s FQDN.

  9. Similar nslookup output is seen even for Node B’s nslookup commands.

  10. Observed RST packets from Node A to Node B and vice versa, during the Elasticsearch’s internal handshake between Node A and Node B.

Elasticsearch.yml on node A has below configurations. Similarly Node B has similar configuration.
transport.host: nodeA.abc.com
transport.port: 9301
transport.bind_host: 127.0.0.1
transport.publish_port: 9300
transport.publish_host: nodeA.abc.com

stunnel configuration on Node A is as below. Node B’s stunnel configuration is similar to the Node A.
++++++++
fips=no
sslVersion=TLSv1.2
cert=/opt/rabbitmq/cert/containercacert.pem
key=/opt/rabbitmq/cert/containerservercertandkey.pem
pid=/var/run/stunnel.pid
output=/opt/CSCOcpm/logs/stunnel.log
debug=7
[es-transport-local]
accept=nodea.byod.local:9301
connect=127.0.0.1:9301
[es-transport]
client=yes
accept=nodea.byod.local:9300
connect=nodea.byod.local:9301
++++++++

++++++++
[2025-07-03T14:11:59,177][WARN ] [elasticsearch[5IqNBrM][transport_worker][T#1]] [o.e.t.TcpTransport ] -:::::- [5IqNBrM] exception caught on transport layer [Netty4TcpChannel{localAddress=/127.0.0.1:9301, remoteAddress=/127.0.0.1:40740}], closing connectionio.netty.handler.codec.DecoderException: java.io.StreamCorruptedException: invalid internal transport message format, got (d,a,ff,f4)at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:472) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.32.Final.jar:4.1.32.Final]at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final]at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final]at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final]at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final]at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final]at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final]at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final]at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]at java.lang.Thread.run(Thread.java:750) [?:1.8.0_372]Caused by: java.io.StreamCorruptedException: invalid internal transport message format, got (d,a,ff,f4)at org.elasticsearch.transport.TcpTransport.readHeaderBuffer(TcpTransport.java:851) ~[elasticsearch-6.8.12.jar:6.8.12]at org.elasticsearch.transport.TcpTransport.readMessageLength(TcpTransport.java:837) ~[elasticsearch-6.8.12.jar:6.8.12]at org.elasticsearch.transport.netty4.Netty4SizeHeaderFrameDecoder.decode(Netty4SizeHeaderFrameDecoder.java:40) ~[transport-netty4-client-6.8.12.jar:6.8.12]at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]... 19 more
++++++++

As it says, the stream of data is being corrupted on its way between the nodes.

Also 6.8 is irresponsibly old at this point, it hasn’t been supported or maintained for many years. You need to upgrade to a supported version as a matter of urgency. All supported versions support TLS natively, no need to muck around with stunnel - moreover modern versions have much better support for troubleshooting this kind of thing.

Thanks for the response David.

  1. Can you please let me know the possible causes for corruption and
  2. Is there any way to recover from this state.

Could be anything really, although whatever it is it’s outside Elasticsearch and relates to something else in your setup.

Yes, the error will go away once each node is receiving exactly the messages that the other node sends.