Serialization & ES 0.90 Beta1

Hi All -

I'm working to get everything up and running on version 0.90 Beta1 and am
seeing a lot of serialization issues.
Message not fully read (response) for [459335] handler
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$3@5922a044,
error [false], resetting and
Message not fully read (response) for [452868] handler
org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction$4@3b573360,
error [false], resetting
and
The message below, including stack trace, for an over-read.

After pulling the commit for Beta1 and adding to the error logger, it
appears the TransportBroadcastOperationAction error is a indicies/stats
action. It doesn't seem that there's any pattern to the hosts sending the
responses.

Reading through the archive, it appears this is normally associated with
either out of sync JVM or ES versions. I've checked via the nodes info API
and all nodes are reporting the same ES version and same JVM version
(1.6.0_31, I'd love to upgrade but I'm a bit limited by supported JVMs in
the rest of the stack).

It seems to occur every 30 seconds or so and multiple lines of the same
message (with different object reference addresses, of course) are reported
every 30 seconds. The cluster itself is 3 master-only nodes, 10 data-only
nodes and 5 transport client nodes. All indices and cluster-state were
created with this version (0.90). There are 516 primary shards in 117
indexes.

So far, I haven't been able to replicate locally (I've run 2-data nodes
with 1 master, trying to create the same index set results in major
swapping, though).

Has anyone else seen similar or have any good ideas on other things to
check?

Thank you,
Angus

Log & stack below

[2013-03-13 21:20:55,275][WARN ][transport.netty ]
[us3es012r09.comp.prod.local] Message read past expected size (response)
for [863366] handler
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$3@383f895d,
error [false], resetting
[2013-03-13 21:20:55,275][DEBUG][action.admin.indices.stats]
[us3es012r09.comp.prod.local] [social_content_1359244800000_v1][1],
node[4UaWqDS1RB-zGSxC5IKPFg], [P], s[STARTED]: Failed to execute
[org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@4ead09b2]
org.elasticsearch.transport.RemoteTransportException: Failed to deserialize
response of type [org.elasticsearch.action.admin.indices.stats.ShardStats]
Caused by: org.elasticsearch.transport.TransportSerializationException:
Failed to deserialize response of type
[org.elasticsearch.action.admin.indices.stats.ShardStats]
at
org.elasticsearch.transport.netty.MessageChannelHandler.handleResponse(MessageChannelHandler.java:150)
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:127)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IndexOutOfBoundsException: Readable byte limit
exceeded: 1256
at
org.elasticsearch.common.netty.buffer.AbstractChannelBuffer.readByte(AbstractChannelBuffer.java:236)
at
org.elasticsearch.transport.netty.ChannelBufferStreamInput.readByte(ChannelBufferStreamInput.java:132)
at
org.elasticsearch.common.io.stream.HandlesStreamInput.readString(HandlesStreamInput.java:47)
at
org.elasticsearch.index.search.stats.SearchStats.readFrom(SearchStats.java:239)
at
org.elasticsearch.index.search.stats.SearchStats.readSearchStats(SearchStats.java:228)
at
org.elasticsearch.action.admin.indices.stats.CommonStats.readFrom(CommonStats.java:212)
at
org.elasticsearch.action.admin.indices.stats.CommonStats.readCommonStats(CommonStats.java:193)
at
org.elasticsearch.action.admin.indices.stats.ShardStats.readFrom(ShardStats.java:69)
at
org.elasticsearch.transport.netty.MessageChannelHandler.handleResponse(MessageChannelHandler.java:148)
... 23 more

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

A little bit of a follow up

After digging at this for a little while longer, it looks like the
StoreStats are overflowing their throttleTimeInNanos which are wrapping
around to negatives and possibly having fun with writeVLong.

It appears the root of the issue is in Lucene's new SimpleRateLimiter.
Instead of returning the paused number of nanoseconds, it returns the
nanosecond that they should exit the method.

I've opened StoreStats's throttleTimeInNanos overflows causing serialization issues · Issue #2785 · elastic/elasticsearch · GitHub

Thanks,
Angus

On Wed, Mar 13, 2013 at 8:31 PM, Angus Davis adavis@airheadtech.com wrote:

Hi All -

I'm working to get everything up and running on version 0.90 Beta1 and am
seeing a lot of serialization issues.
Message not fully read (response) for [459335] handler
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$3@5922a044,
error [false], resetting and
Message not fully read (response) for [452868] handler
org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction$4@3b573360,
error [false], resetting
and
The message below, including stack trace, for an over-read.

After pulling the commit for Beta1 and adding to the error logger, it
appears the TransportBroadcastOperationAction error is a indicies/stats
action. It doesn't seem that there's any pattern to the hosts sending the
responses.

Reading through the archive, it appears this is normally associated with
either out of sync JVM or ES versions. I've checked via the nodes info API
and all nodes are reporting the same ES version and same JVM version
(1.6.0_31, I'd love to upgrade but I'm a bit limited by supported JVMs in
the rest of the stack).

It seems to occur every 30 seconds or so and multiple lines of the same
message (with different object reference addresses, of course) are reported
every 30 seconds. The cluster itself is 3 master-only nodes, 10 data-only
nodes and 5 transport client nodes. All indices and cluster-state were
created with this version (0.90). There are 516 primary shards in 117
indexes.

So far, I haven't been able to replicate locally (I've run 2-data nodes
with 1 master, trying to create the same index set results in major
swapping, though).

Has anyone else seen similar or have any good ideas on other things to
check?

Thank you,
Angus

Log & stack below

[2013-03-13 21:20:55,275][WARN ][transport.netty ]
[us3es012r09.comp.prod.local] Message read past expected size (response)
for [863366] handler
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$3@383f895d,
error [false], resetting
[2013-03-13 21:20:55,275][DEBUG][action.admin.indices.stats]
[us3es012r09.comp.prod.local] [social_content_1359244800000_v1][1],
node[4UaWqDS1RB-zGSxC5IKPFg], [P], s[STARTED]: Failed to execute
[org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@4ead09b2
]
org.elasticsearch.transport.RemoteTransportException: Failed to
deserialize response of type
[org.elasticsearch.action.admin.indices.stats.ShardStats]
Caused by: org.elasticsearch.transport.TransportSerializationException:
Failed to deserialize response of type
[org.elasticsearch.action.admin.indices.stats.ShardStats]
at
org.elasticsearch.transport.netty.MessageChannelHandler.handleResponse(MessageChannelHandler.java:150)
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:127)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IndexOutOfBoundsException: Readable byte limit
exceeded: 1256
at
org.elasticsearch.common.netty.buffer.AbstractChannelBuffer.readByte(AbstractChannelBuffer.java:236)
at
org.elasticsearch.transport.netty.ChannelBufferStreamInput.readByte(ChannelBufferStreamInput.java:132)
at
org.elasticsearch.common.io.stream.HandlesStreamInput.readString(HandlesStreamInput.java:47)
at
org.elasticsearch.index.search.stats.SearchStats.readFrom(SearchStats.java:239)
at
org.elasticsearch.index.search.stats.SearchStats.readSearchStats(SearchStats.java:228)
at
org.elasticsearch.action.admin.indices.stats.CommonStats.readFrom(CommonStats.java:212)
at
org.elasticsearch.action.admin.indices.stats.CommonStats.readCommonStats(CommonStats.java:193)
at
org.elasticsearch.action.admin.indices.stats.ShardStats.readFrom(ShardStats.java:69)
at
org.elasticsearch.transport.netty.MessageChannelHandler.handleResponse(MessageChannelHandler.java:148)
... 23 more

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.