Java.io.StreamCorruptedException: invalid internal transport message format, got (3,41,4d,52)


(Chris Neal) #1

Hi all.

I have a 13 node cluster running 1.6.0:

3 dedicated masters
4 dedicated clients
6 dedicated data nodes

All was well until one of the data nodes logged the following exception, and disconnected itself from my cluster:

[2015-07-09 09:53:06,953][WARN ][transport.netty          ] [elasticsearch-bdprodes08] exception caught on transport layer [[id: 0x6b716852, /10.200.116.249:60911 :> /10.200.116.248:9300]], closing connection
java.io.StreamCorruptedException: invalid internal transport message format, got (3,41,4d,52)
        at org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:63)
        at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
        at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
        at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
[2015-07-09 09:53:07,293][INFO ][discovery.zen            ] [elasticsearch-bdprodes08] master_left [[elasticsearch-bdprodes03-2][AAxFokkhSBqOTojbhGO-EQ][bdprodes03][inet[/10.200.116.70:9301]]{data=false, master=true}], reason [do not exists on master, act as master failure]
[2015-07-09 09:53:07,413][WARN ][discovery.zen            ] [elasticsearch-bdprodes08] master left (reason = do not exists on master, act as master failure), current nodes: {[elasticsearch-bdprodes01-2][D9MwysnEQXWKr5scrYLNpA][bdprodes01][inet[/10.200.116.68:9301]]{data=false, master=true},[elasticsearch-bdprodes05][DvJiqAM9TE-CVRL6v3YArw][bdprodes05][inet[/10.200.116.72:9300]]{master=false},[elasticsearch-bdprodes09][VoCSUvcRQFKW65EgV4bBYQ][bdprodes09][inet[/10.200.116.249:9300]]{master=false},[elasticsearch-bdprodes06][yyJn5RjZQpeg5hIf0e_4QA][bdprodes06][inet[/10.200.116.73:9300]]{master=false},[elasticsearch-bdprodes02-2][G9tH1fyITSqVW8lJ9vccVw][bdprodes02][inet[/10.200.116.69:9301]]{data=false, master=true},[elasticsearch-bdprodes02][sJM-puI8RSmTdQjIW85J4Q][bdprodes02][inet[/10.200.116.69:9300]]{data=false, master=false},[elasticsearch-bdprodes08][ZxaQ4iHJTO-vx5L8VqTbZA][bdprodes08][inet[bdprodes08.dbhotelcloud.com/10.200.116.248:9300]]{master=false},[elasticsearch-bdprodes04][_VGeVBHRR1ukhYkrZ1rVOQ][bdprodes04][inet[/10.200.116.71:9300]]{data=false, master=false},[elasticsearch-bdprodes10][BvTjSLARRmiMFdBkrIUdHQ][bdprodes10][inet[/10.200.116.250:9300]]{master=false},[elasticsearch-bdprodes01][nMPRh9BUSPaj_PWgk6LBoQ][bdprodes01][inet[/10.200.116.68:9300]]{data=false, master=false},[elasticsearch-bdprodes03][oQj7PUa8R5aw_2jcIxCbfA][bdprodes03][inet[/10.200.116.70:9300]]{data=false, master=false},[elasticsearch-bdprodes07][y6JfVvVpRjev4y6PEil9Eg][bdprodes07][inet[/10.200.116.247:9300]]{master=false},}

The other Data nodes simply log this:

[2015-07-09 05:34:18,757][INFO ][cluster.service          ] [elasticsearch-bdprodes05] removed {[elasticsearch-bdprodes08][ZxaQ4iHJTO-vx5L8VqTbZA][bdprodes08][inet[/10.200.116.248:9300]]{master=false},}, reason: zen-disco-receive(from master [[elasticsearch-bdprodes03-2][AAxFokkhSBqOTojbhGO-EQ][bdprodes03][inet[/10.200.116.70:9301]]{data=false, master=true}])

And the master logged his:

[2015-07-09 05:34:18,598][INFO ][cluster.service          ] [elasticsearch-bdprodes03-2] removed {[elasticsearch-bdprodes08][ZxaQ4iHJTO-vx5L8VqTbZA][bdprodes08][inet[bdprodes08.dbhotelcloud.com/10.200.116.248:9300]]{master=false},}, reason: zen-disco-node_failed([elasticsearch-bdprodes08][ZxaQ4iHJTO-vx5L8VqTbZA][bdprodes08][inet[bdprodes08.dbhotelcloud.com/10.200.116.248:9300]]{master=false}), reason failed to ping, tried [3] times, each with maximum [30s] timeout

What the heck happened?? :smile:
This one was a new one for me. I tried just bouncing the node that got the exception, but that did not restore the cluster, so I ended up doing a full restart. Ouch.

Anyone seen this before?
Many thanks!
Chris


(Mark Walkom) #2

This like this can occur from different JVM versions, are they all the same?


(Chris Neal) #3

Well, they were supposed to be.
Masters and clients are java version "1.7.0_65", data nodes are java version "1.7.0_79".

Guess I'll be upgrading the masters and clients in the morning.

Thanks Mark. You are seriously all over this group. I hope you're getting paid for all your free help! If not, I'll have to send you a beer or 12. I know you've personally answered many of my questions!

Chris


(Mark Walkom) #4

Elastic pays me to work for them :wink:


(system) #5