On our cluster of ES running 1.3.0 over 6 nodes, we observed a stackoverflow error on one of the nodes. The system has been under moderate bulk indexing operations.
The clients were receiving a Node disconnected exception for this node of ES and further drilling down into the logs showed the exception as shown below.
Had to restart the elasticsearch service but the cluster failed to recover to a green state. Atleast 2 indices were stuck initializing or relocating.
[2015-07-09 08:59:54,214][WARN ][org.elasticsearch ] Exception cause unwrapping ran for 10 levels...
org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-6-QA2906-perf][inet[/172.31.44.76:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-3-QA2906-perf][inet[/172.31.34.173:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-6-QA2906-perf][inet[/172.31.44.76:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-3-QA2906-perf][inet[/172.31.34.173:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-6-QA2906-perf][inet[/172.31.44.76:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-3-QA2906-perf][inet[/172.31.34.173:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-6-QA2906-perf][inet[/172.31.44.76:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-3-QA2906-perf][inet[/172.31.34.173:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-6-QA2906-perf][inet[/172.31.44.76:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-3-QA2906-perf][inet[/172.31.34.173:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-6-QA2906-perf][inet[/172.31.44.76:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-3-QA2906-perf][inet[/172.31.34.173:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-3-QA2906-perf][inet[/172.31.34.173:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-6-QA2906-perf][inet[/172.31.44.76:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-3-QA2906-perf][inet[/172.31.34.173:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-6-QA2906-perf][inet[/172.31.44.76:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-3-QA2906-perf][inet[/172.31.34.173:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-6-QA2906-perf][inet[/172.31.44.76:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-3-QA2906-perf][inet[/172.31.34.173:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-6-QA2906-perf][inet[/172.31.44.76:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-3-QA2906-perf][inet[/172.31.34.173:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-6-QA2906-perf][inet[/172.31.44.76:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-3-QA2906-perf][inet[/172.31.34.173:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-6-QA2906-perf][inet[/172.31.44.76:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-3-QA2906-perf][inet[/172.31.34.173:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-6-QA2906-perf][inet[/172.31.44.76:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-3-QA2906-perf][inet[/172.31.34.173:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-6-QA2906-perf][inet[/172.31.44.76:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-3-QA2906-perf][inet[/172.31.34.173:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: [metrics-datastore-6-QA2906-perf][inet[/172.31.44.76:9300]][bulk/shard]
Caused by: org.elasticsearch.transport.RemoteTransportException: Failed to deserialize exception response from stream
Caused by: org.elasticsearch.transport.TransportSerializationException: Failed to deserialize exception response from stream
at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:173)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:125)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.StackOverflowError