Java.io.StreamCorruptedException: invalid internal transport message format, got (3,41,4d,52)

Hi all.

I have a 13 node cluster running 1.6.0:

3 dedicated masters
4 dedicated clients
6 dedicated data nodes

All was well until one of the data nodes logged the following exception, and disconnected itself from my cluster:

[2015-07-09 09:53:06,953][WARN ][transport.netty          ] [elasticsearch-bdprodes08] exception caught on transport layer [[id: 0x6b716852, /10.200.116.249:60911 :> /10.200.116.248:9300]], closing connection
java.io.StreamCorruptedException: invalid internal transport message format, got (3,41,4d,52)
        at org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:63)
        at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
        at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
        at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
[2015-07-09 09:53:07,293][INFO ][discovery.zen            ] [elasticsearch-bdprodes08] master_left [[elasticsearch-bdprodes03-2][AAxFokkhSBqOTojbhGO-EQ][bdprodes03][inet[/10.200.116.70:9301]]{data=false, master=true}], reason [do not exists on master, act as master failure]
[2015-07-09 09:53:07,413][WARN ][discovery.zen            ] [elasticsearch-bdprodes08] master left (reason = do not exists on master, act as master failure), current nodes: {[elasticsearch-bdprodes01-2][D9MwysnEQXWKr5scrYLNpA][bdprodes01][inet[/10.200.116.68:9301]]{data=false, master=true},[elasticsearch-bdprodes05][DvJiqAM9TE-CVRL6v3YArw][bdprodes05][inet[/10.200.116.72:9300]]{master=false},[elasticsearch-bdprodes09][VoCSUvcRQFKW65EgV4bBYQ][bdprodes09][inet[/10.200.116.249:9300]]{master=false},[elasticsearch-bdprodes06][yyJn5RjZQpeg5hIf0e_4QA][bdprodes06][inet[/10.200.116.73:9300]]{master=false},[elasticsearch-bdprodes02-2][G9tH1fyITSqVW8lJ9vccVw][bdprodes02][inet[/10.200.116.69:9301]]{data=false, master=true},[elasticsearch-bdprodes02][sJM-puI8RSmTdQjIW85J4Q][bdprodes02][inet[/10.200.116.69:9300]]{data=false, master=false},[elasticsearch-bdprodes08][ZxaQ4iHJTO-vx5L8VqTbZA][bdprodes08][inet[bdprodes08.dbhotelcloud.com/10.200.116.248:9300]]{master=false},[elasticsearch-bdprodes04][_VGeVBHRR1ukhYkrZ1rVOQ][bdprodes04][inet[/10.200.116.71:9300]]{data=false, master=false},[elasticsearch-bdprodes10][BvTjSLARRmiMFdBkrIUdHQ][bdprodes10][inet[/10.200.116.250:9300]]{master=false},[elasticsearch-bdprodes01][nMPRh9BUSPaj_PWgk6LBoQ][bdprodes01][inet[/10.200.116.68:9300]]{data=false, master=false},[elasticsearch-bdprodes03][oQj7PUa8R5aw_2jcIxCbfA][bdprodes03][inet[/10.200.116.70:9300]]{data=false, master=false},[elasticsearch-bdprodes07][y6JfVvVpRjev4y6PEil9Eg][bdprodes07][inet[/10.200.116.247:9300]]{master=false},}

The other Data nodes simply log this:

[2015-07-09 05:34:18,757][INFO ][cluster.service          ] [elasticsearch-bdprodes05] removed {[elasticsearch-bdprodes08][ZxaQ4iHJTO-vx5L8VqTbZA][bdprodes08][inet[/10.200.116.248:9300]]{master=false},}, reason: zen-disco-receive(from master [[elasticsearch-bdprodes03-2][AAxFokkhSBqOTojbhGO-EQ][bdprodes03][inet[/10.200.116.70:9301]]{data=false, master=true}])

And the master logged his:

[2015-07-09 05:34:18,598][INFO ][cluster.service          ] [elasticsearch-bdprodes03-2] removed {[elasticsearch-bdprodes08][ZxaQ4iHJTO-vx5L8VqTbZA][bdprodes08][inet[bdprodes08.dbhotelcloud.com/10.200.116.248:9300]]{master=false},}, reason: zen-disco-node_failed([elasticsearch-bdprodes08][ZxaQ4iHJTO-vx5L8VqTbZA][bdprodes08][inet[bdprodes08.dbhotelcloud.com/10.200.116.248:9300]]{master=false}), reason failed to ping, tried [3] times, each with maximum [30s] timeout

What the heck happened?? :smile:
This one was a new one for me. I tried just bouncing the node that got the exception, but that did not restore the cluster, so I ended up doing a full restart. Ouch.

Anyone seen this before?
Many thanks!
Chris

This like this can occur from different JVM versions, are they all the same?

Well, they were supposed to be.
Masters and clients are java version "1.7.0_65", data nodes are java version "1.7.0_79".

Guess I'll be upgrading the masters and clients in the morning.

Thanks Mark. You are seriously all over this group. I hope you're getting paid for all your free help! If not, I'll have to send you a beer or 12. I know you've personally answered many of my questions!

Chris

1 Like

Elastic pays me to work for them :wink:

1 Like