StreamCorruptedException


(Matthew A. Brown) #1

Hi all,

We've got a cluster of 8 nodes. Recently we started experiencing
intermittent hangs of our application (we've since added timeouts to
prevent this)… at the time, all nodes in the cluster were reporting
green, but on further examination, one node was throwing the below
errors.

A bounce of the node resolved our issues. Any ideas on what
happened/how to catch this in the future? Thanks!

Errors from the node that was reporting green:

[2011-12-16 15:46:48,435][WARN ][transport.netty ]
[prod-elasticsearch-r04] Exception caught on netty layer [[id:
0x7361b0bc, /10.180.35.110:60042 => /10.180.46.203:9300]]
java.io.StreamCorruptedException: invalid data length: 0
at org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:137)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:101)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:274)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:261)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:351)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:282)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:202)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2011-12-16 15:48:50,171][INFO ][node ]
[prod-elasticsearch-r04] {0.18.5}[6520]: stopping ...
[2011-12-16 15:48:50,601][WARN ][transport.netty ]
[prod-elasticsearch-r04] Exception caught on netty layer [[id:
0x7361b0bc, /10.180.35.110:60042 :> /10.180.46.203:9300]]
java.io.StreamCorruptedException: invalid data length: 0
at org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:137)
at org.elasticsearch.transport.netty.MessageChannelHandler.cleanup(MessageChannelHandler.java:170)
at org.elasticsearch.transport.netty.MessageChannelHandler.channelDisconnected(MessageChannelHandler.java:119)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.fireChannelDisconnected(Channels.java:360)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.close(NioWorker.java:595)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:101)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:574)
at org.elasticsearch.common.netty.channel.Channels.close(Channels.java:720)
at org.elasticsearch.common.netty.channel.AbstractChannel.close(AbstractChannel.java:200)
at org.elasticsearch.transport.netty.NettyTransport$NodeChannels.closeChannelsAndWait(NettyTransport.java:706)
at org.elasticsearch.transport.netty.NettyTransport$NodeChannels.close(NettyTransport.java:695)
at org.elasticsearch.transport.netty.NettyTransport$5.run(NettyTransport.java:332)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)


(Shay Banon) #2

Are the Ips mentioned in the message part of the cluster?

On Fri, Dec 16, 2011 at 6:31 PM, Matthew A. Brown mat.a.brown@gmail.comwrote:

Hi all,

We've got a cluster of 8 nodes. Recently we started experiencing
intermittent hangs of our application (we've since added timeouts to
prevent this)… at the time, all nodes in the cluster were reporting
green, but on further examination, one node was throwing the below
errors.

A bounce of the node resolved our issues. Any ideas on what
happened/how to catch this in the future? Thanks!

Errors from the node that was reporting green:

[2011-12-16 15:46:48,435][WARN ][transport.netty ]
[prod-elasticsearch-r04] Exception caught on netty layer [[id:
0x7361b0bc, /10.180.35.110:60042 => /10.180.46.203:9300]]
java.io.StreamCorruptedException: invalid data length: 0
at
org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:137)
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:101)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:274)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:261)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:351)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:282)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:202)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2011-12-16 15:48:50,171][INFO ][node ]
[prod-elasticsearch-r04] {0.18.5}[6520]: stopping ...
[2011-12-16 15:48:50,601][WARN ][transport.netty ]
[prod-elasticsearch-r04] Exception caught on netty layer [[id:
0x7361b0bc, /10.180.35.110:60042 :> /10.180.46.203:9300]]
java.io.StreamCorruptedException: invalid data length: 0
at
org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:137)
at
org.elasticsearch.transport.netty.MessageChannelHandler.cleanup(MessageChannelHandler.java:170)
at
org.elasticsearch.transport.netty.MessageChannelHandler.channelDisconnected(MessageChannelHandler.java:119)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.elasticsearch.common.netty.channel.Channels.fireChannelDisconnected(Channels.java:360)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.close(NioWorker.java:595)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:101)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:574)
at
org.elasticsearch.common.netty.channel.Channels.close(Channels.java:720)
at
org.elasticsearch.common.netty.channel.AbstractChannel.close(AbstractChannel.java:200)
at
org.elasticsearch.transport.netty.NettyTransport$NodeChannels.closeChannelsAndWait(NettyTransport.java:706)
at
org.elasticsearch.transport.netty.NettyTransport$NodeChannels.close(NettyTransport.java:695)
at
org.elasticsearch.transport.netty.NettyTransport$5.run(NettyTransport.java:332)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)


(Matthew A. Brown) #3

Yes, they are -- thanks!

On Fri, Dec 16, 2011 at 12:30, Shay Banon kimchy@gmail.com wrote:

Are the Ips mentioned in the message part of the cluster?

On Fri, Dec 16, 2011 at 6:31 PM, Matthew A. Brown mat.a.brown@gmail.com
wrote:

Hi all,

We've got a cluster of 8 nodes. Recently we started experiencing
intermittent hangs of our application (we've since added timeouts to
prevent this)… at the time, all nodes in the cluster were reporting
green, but on further examination, one node was throwing the below
errors.

A bounce of the node resolved our issues. Any ideas on what
happened/how to catch this in the future? Thanks!

Errors from the node that was reporting green:

[2011-12-16 15:46:48,435][WARN ][transport.netty ]
[prod-elasticsearch-r04] Exception caught on netty layer [[id:
0x7361b0bc, /10.180.35.110:60042 => /10.180.46.203:9300]]
java.io.StreamCorruptedException: invalid data length: 0
at
org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:137)
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:101)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:274)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:261)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:351)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:282)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:202)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2011-12-16 15:48:50,171][INFO ][node ]
[prod-elasticsearch-r04] {0.18.5}[6520]: stopping ...
[2011-12-16 15:48:50,601][WARN ][transport.netty ]
[prod-elasticsearch-r04] Exception caught on netty layer [[id:
0x7361b0bc, /10.180.35.110:60042 :> /10.180.46.203:9300]]
java.io.StreamCorruptedException: invalid data length: 0
at
org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:137)
at
org.elasticsearch.transport.netty.MessageChannelHandler.cleanup(MessageChannelHandler.java:170)
at
org.elasticsearch.transport.netty.MessageChannelHandler.channelDisconnected(MessageChannelHandler.java:119)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.elasticsearch.common.netty.channel.Channels.fireChannelDisconnected(Channels.java:360)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.close(NioWorker.java:595)
at
org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:101)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:574)
at
org.elasticsearch.common.netty.channel.Channels.close(Channels.java:720)
at
org.elasticsearch.common.netty.channel.AbstractChannel.close(AbstractChannel.java:200)
at
org.elasticsearch.transport.netty.NettyTransport$NodeChannels.closeChannelsAndWait(NettyTransport.java:706)
at
org.elasticsearch.transport.netty.NettyTransport$NodeChannels.close(NettyTransport.java:695)
at
org.elasticsearch.transport.netty.NettyTransport$5.run(NettyTransport.java:332)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)


(Shay Banon) #4

And once you saw those messages the node got "stuck"? I see that the
exception and shutting the node down are very close (adn one happens during
the stop process).

On Fri, Dec 16, 2011 at 8:21 PM, Matthew A. Brown mat.a.brown@gmail.comwrote:

Yes, they are -- thanks!

On Fri, Dec 16, 2011 at 12:30, Shay Banon kimchy@gmail.com wrote:

Are the Ips mentioned in the message part of the cluster?

On Fri, Dec 16, 2011 at 6:31 PM, Matthew A. Brown <mat.a.brown@gmail.com

wrote:

Hi all,

We've got a cluster of 8 nodes. Recently we started experiencing
intermittent hangs of our application (we've since added timeouts to
prevent this)… at the time, all nodes in the cluster were reporting
green, but on further examination, one node was throwing the below
errors.

A bounce of the node resolved our issues. Any ideas on what
happened/how to catch this in the future? Thanks!

Errors from the node that was reporting green:

[2011-12-16 15:46:48,435][WARN ][transport.netty ]
[prod-elasticsearch-r04] Exception caught on netty layer [[id:
0x7361b0bc, /10.180.35.110:60042 => /10.180.46.203:9300]]
java.io.StreamCorruptedException: invalid data length: 0
at

org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:137)

   at

org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:101)

   at

org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)

   at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)

   at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)

   at

org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:274)

   at

org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:261)

   at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:351)

   at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:282)

   at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:202)

   at

org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

   at

org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)

   at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

   at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

   at java.lang.Thread.run(Thread.java:636)

[2011-12-16 15:48:50,171][INFO ][node ]
[prod-elasticsearch-r04] {0.18.5}[6520]: stopping ...
[2011-12-16 15:48:50,601][WARN ][transport.netty ]
[prod-elasticsearch-r04] Exception caught on netty layer [[id:
0x7361b0bc, /10.180.35.110:60042 :> /10.180.46.203:9300]]
java.io.StreamCorruptedException: invalid data length: 0
at

org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:137)

   at

org.elasticsearch.transport.netty.MessageChannelHandler.cleanup(MessageChannelHandler.java:170)

   at

org.elasticsearch.transport.netty.MessageChannelHandler.channelDisconnected(MessageChannelHandler.java:119)

   at

org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)

   at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)

   at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)

   at

org.elasticsearch.common.netty.channel.Channels.fireChannelDisconnected(Channels.java:360)

   at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.close(NioWorker.java:595)

   at

org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:101)

   at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:574)

   at

org.elasticsearch.common.netty.channel.Channels.close(Channels.java:720)
at

org.elasticsearch.common.netty.channel.AbstractChannel.close(AbstractChannel.java:200)

   at

org.elasticsearch.transport.netty.NettyTransport$NodeChannels.closeChannelsAndWait(NettyTransport.java:706)

   at

org.elasticsearch.transport.netty.NettyTransport$NodeChannels.close(NettyTransport.java:695)

   at

org.elasticsearch.transport.netty.NettyTransport$5.run(NettyTransport.java:332)

   at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

   at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

   at java.lang.Thread.run(Thread.java:636)

(Matthew A. Brown) #5

Ah, that's just a coincidence of me pasting in the tail of the logfile
-- there were plenty more of these well before the shutdown.

What we were experiencing was requests that hanged indefinitely; the
problem was resolved (and those errors stopped appearing) once we
bounced the node.

Thanks!

On Fri, Dec 16, 2011 at 13:25, Shay Banon kimchy@gmail.com wrote:

And once you saw those messages the node got "stuck"? I see that the
exception and shutting the node down are very close (adn one happens during
the stop process).

On Fri, Dec 16, 2011 at 8:21 PM, Matthew A. Brown mat.a.brown@gmail.com
wrote:

Yes, they are -- thanks!

On Fri, Dec 16, 2011 at 12:30, Shay Banon kimchy@gmail.com wrote:

Are the Ips mentioned in the message part of the cluster?

On Fri, Dec 16, 2011 at 6:31 PM, Matthew A. Brown
mat.a.brown@gmail.com
wrote:

Hi all,

We've got a cluster of 8 nodes. Recently we started experiencing
intermittent hangs of our application (we've since added timeouts to
prevent this)… at the time, all nodes in the cluster were reporting
green, but on further examination, one node was throwing the below
errors.

A bounce of the node resolved our issues. Any ideas on what
happened/how to catch this in the future? Thanks!

Errors from the node that was reporting green:

[2011-12-16 15:46:48,435][WARN ][transport.netty ]
[prod-elasticsearch-r04] Exception caught on netty layer [[id:
0x7361b0bc, /10.180.35.110:60042 => /10.180.46.203:9300]]
java.io.StreamCorruptedException: invalid data length: 0
at

org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:137)
at

org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:101)
at

org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at

org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:274)
at

org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:261)
at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:351)
at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:282)
at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:202)
at

org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at

org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2011-12-16 15:48:50,171][INFO ][node ]
[prod-elasticsearch-r04] {0.18.5}[6520]: stopping ...
[2011-12-16 15:48:50,601][WARN ][transport.netty ]
[prod-elasticsearch-r04] Exception caught on netty layer [[id:
0x7361b0bc, /10.180.35.110:60042 :> /10.180.46.203:9300]]
java.io.StreamCorruptedException: invalid data length: 0
at

org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:137)
at

org.elasticsearch.transport.netty.MessageChannelHandler.cleanup(MessageChannelHandler.java:170)
at

org.elasticsearch.transport.netty.MessageChannelHandler.channelDisconnected(MessageChannelHandler.java:119)
at

org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at

org.elasticsearch.common.netty.channel.Channels.fireChannelDisconnected(Channels.java:360)
at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.close(NioWorker.java:595)
at

org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:101)
at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:574)
at

org.elasticsearch.common.netty.channel.Channels.close(Channels.java:720)
at

org.elasticsearch.common.netty.channel.AbstractChannel.close(AbstractChannel.java:200)
at

org.elasticsearch.transport.netty.NettyTransport$NodeChannels.closeChannelsAndWait(NettyTransport.java:706)
at

org.elasticsearch.transport.netty.NettyTransport$NodeChannels.close(NettyTransport.java:695)
at

org.elasticsearch.transport.netty.NettyTransport$5.run(NettyTransport.java:332)
at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)


(Shay Banon) #6

I tried to chase it down and check why it might happen, but no luck... . Is
there a way for me to somehow recreate it (I know, I know, its hard..., but
still worth asking...)

On Fri, Dec 16, 2011 at 8:29 PM, Matthew A. Brown mat.a.brown@gmail.comwrote:

Ah, that's just a coincidence of me pasting in the tail of the logfile
-- there were plenty more of these well before the shutdown.

What we were experiencing was requests that hanged indefinitely; the
problem was resolved (and those errors stopped appearing) once we
bounced the node.

Thanks!

On Fri, Dec 16, 2011 at 13:25, Shay Banon kimchy@gmail.com wrote:

And once you saw those messages the node got "stuck"? I see that the
exception and shutting the node down are very close (adn one happens
during
the stop process).

On Fri, Dec 16, 2011 at 8:21 PM, Matthew A. Brown <mat.a.brown@gmail.com

wrote:

Yes, they are -- thanks!

On Fri, Dec 16, 2011 at 12:30, Shay Banon kimchy@gmail.com wrote:

Are the Ips mentioned in the message part of the cluster?

On Fri, Dec 16, 2011 at 6:31 PM, Matthew A. Brown
mat.a.brown@gmail.com
wrote:

Hi all,

We've got a cluster of 8 nodes. Recently we started experiencing
intermittent hangs of our application (we've since added timeouts to
prevent this)… at the time, all nodes in the cluster were reporting
green, but on further examination, one node was throwing the below
errors.

A bounce of the node resolved our issues. Any ideas on what
happened/how to catch this in the future? Thanks!

Errors from the node that was reporting green:

[2011-12-16 15:46:48,435][WARN ][transport.netty ]
[prod-elasticsearch-r04] Exception caught on netty layer [[id:
0x7361b0bc, /10.180.35.110:60042 => /10.180.46.203:9300]]
java.io.StreamCorruptedException: invalid data length: 0
at

org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:137)

   at

org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:101)

   at

org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)

   at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)

   at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)

   at

org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:274)

   at

org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:261)

   at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:351)

   at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:282)

   at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:202)

   at

org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

   at

org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)

   at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

   at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

   at java.lang.Thread.run(Thread.java:636)

[2011-12-16 15:48:50,171][INFO ][node ]
[prod-elasticsearch-r04] {0.18.5}[6520]: stopping ...
[2011-12-16 15:48:50,601][WARN ][transport.netty ]
[prod-elasticsearch-r04] Exception caught on netty layer [[id:
0x7361b0bc, /10.180.35.110:60042 :> /10.180.46.203:9300]]
java.io.StreamCorruptedException: invalid data length: 0
at

org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:137)

   at

org.elasticsearch.transport.netty.MessageChannelHandler.cleanup(MessageChannelHandler.java:170)

   at

org.elasticsearch.transport.netty.MessageChannelHandler.channelDisconnected(MessageChannelHandler.java:119)

   at

org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)

   at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)

   at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)

   at

org.elasticsearch.common.netty.channel.Channels.fireChannelDisconnected(Channels.java:360)

   at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.close(NioWorker.java:595)

   at

org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:101)

   at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:574)

   at

org.elasticsearch.common.netty.channel.Channels.close(Channels.java:720)

   at

org.elasticsearch.common.netty.channel.AbstractChannel.close(AbstractChannel.java:200)

   at

org.elasticsearch.transport.netty.NettyTransport$NodeChannels.closeChannelsAndWait(NettyTransport.java:706)

   at

org.elasticsearch.transport.netty.NettyTransport$NodeChannels.close(NettyTransport.java:695)

   at

org.elasticsearch.transport.netty.NettyTransport$5.run(NettyTransport.java:332)

   at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

   at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

   at java.lang.Thread.run(Thread.java:636)

(Matthew A. Brown) #7

Yeah, unfortunately as you can imagine we've no idea how one might
recreate it. Best we can do I suppose is to keep an eye on things and
see if we notice a pattern if it does happen again...

Thanks for looking in to it!

On Wed, Dec 21, 2011 at 20:03, Shay Banon kimchy@gmail.com wrote:

I tried to chase it down and check why it might happen, but no luck... . Is
there a way for me to somehow recreate it (I know, I know, its hard..., but
still worth asking...)

On Fri, Dec 16, 2011 at 8:29 PM, Matthew A. Brown mat.a.brown@gmail.com
wrote:

Ah, that's just a coincidence of me pasting in the tail of the logfile
-- there were plenty more of these well before the shutdown.

What we were experiencing was requests that hanged indefinitely; the
problem was resolved (and those errors stopped appearing) once we
bounced the node.

Thanks!

On Fri, Dec 16, 2011 at 13:25, Shay Banon kimchy@gmail.com wrote:

And once you saw those messages the node got "stuck"? I see that the
exception and shutting the node down are very close (adn one happens
during
the stop process).

On Fri, Dec 16, 2011 at 8:21 PM, Matthew A. Brown
mat.a.brown@gmail.com
wrote:

Yes, they are -- thanks!

On Fri, Dec 16, 2011 at 12:30, Shay Banon kimchy@gmail.com wrote:

Are the Ips mentioned in the message part of the cluster?

On Fri, Dec 16, 2011 at 6:31 PM, Matthew A. Brown
mat.a.brown@gmail.com
wrote:

Hi all,

We've got a cluster of 8 nodes. Recently we started experiencing
intermittent hangs of our application (we've since added timeouts to
prevent this)… at the time, all nodes in the cluster were reporting
green, but on further examination, one node was throwing the below
errors.

A bounce of the node resolved our issues. Any ideas on what
happened/how to catch this in the future? Thanks!

Errors from the node that was reporting green:

[2011-12-16 15:46:48,435][WARN ][transport.netty ]
[prod-elasticsearch-r04] Exception caught on netty layer [[id:
0x7361b0bc, /10.180.35.110:60042 => /10.180.46.203:9300]]
java.io.StreamCorruptedException: invalid data length: 0
at

org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:137)
at

org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:101)
at

org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at

org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:274)
at

org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:261)
at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:351)
at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:282)
at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:202)
at

org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at

org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2011-12-16 15:48:50,171][INFO ][node ]
[prod-elasticsearch-r04] {0.18.5}[6520]: stopping ...
[2011-12-16 15:48:50,601][WARN ][transport.netty ]
[prod-elasticsearch-r04] Exception caught on netty layer [[id:
0x7361b0bc, /10.180.35.110:60042 :> /10.180.46.203:9300]]
java.io.StreamCorruptedException: invalid data length: 0
at

org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:137)
at

org.elasticsearch.transport.netty.MessageChannelHandler.cleanup(MessageChannelHandler.java:170)
at

org.elasticsearch.transport.netty.MessageChannelHandler.channelDisconnected(MessageChannelHandler.java:119)
at

org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at

org.elasticsearch.common.netty.channel.Channels.fireChannelDisconnected(Channels.java:360)
at

org.elasticsearch.common.netty.channel.socket.nio.NioWorker.close(NioWorker.java:595)
at

org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:101)
at

org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:574)
at

org.elasticsearch.common.netty.channel.Channels.close(Channels.java:720)
at

org.elasticsearch.common.netty.channel.AbstractChannel.close(AbstractChannel.java:200)
at

org.elasticsearch.transport.netty.NettyTransport$NodeChannels.closeChannelsAndWait(NettyTransport.java:706)
at

org.elasticsearch.transport.netty.NettyTransport$NodeChannels.close(NettyTransport.java:695)
at

org.elasticsearch.transport.netty.NettyTransport$5.run(NettyTransport.java:332)
at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)


(system) #8