Random node failures

Recently brought up a 2 node cluster running 1.5.2-1 and things were working fine. Its been about 2 weeks now and for the past few days I have had to restart the nodes for the cluster to recover and become useable again.

I am running 2 r3.xlarge instances within AWS each configured with a 1TB GP2 EBS volume and using the ec2 discovery method. When things start to go bad the last thing reported in the log is
Caused by: org.elasticsearch.transport.TransportException: TransportService is closed stopped can't send request
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:270)

When I look at the bigdesk plugin cluster health shows green however all I get from Kibana is a Gateway error when trying a search. If I restart elasticsearch things seem to be fine for around 20 hours or so and then the same situation reoccurs.

I also have a single node cluster that I used as a test machine and it has never displayed this behavior.

Can I provide further details/configuration to help diagnose if I have misconfigured in some way. Or help determine if I am hitting some kind of resource issue?

--
Robert

What do your logs show? There should be more than that.

Prior to the restart of the node there are pages and pages of below:
[2015-08-14 20:31:10,323][DEBUG][action.admin.indices.stats] [esnode1] [logstash-2015.07.31][2], node[AZ1B_kkRPqJJBI09XZ3GcM], [R], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@3bda42c5]
org.elasticsearch.transport.SendRequestTransportException: [esnode2][inet[/esnode2:9300]][indices:monitor/stats[s]]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:286)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:249)
at org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.performOperation(TransportBroadcastOperationAction.java:183)
at org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.start(TransportBroadcastOperationAction.java:151)
at org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction.doExecute(TransportBroadcastOperationAction.java:71)
at org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction.doExecute(TransportBroadcastOperationAction.java:47)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
at org.elasticsearch.cluster.InternalClusterInfoService.updateIndicesStats(InternalClusterInfoService.java:267)
at org.elasticsearch.cluster.InternalClusterInfoService$ClusterInfoUpdateJob.run(InternalClusterInfoService.java:356)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.transport.TransportException: TransportService is closed stopped can't send request
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:270)
... 11 more

But prior to above there is also this
[2015-08-14 20:31:08,881][WARN ][netty.channel.DefaultChannelPipeline] An exception was thrown by an exception handler.
java.util.concurrent.RejectedExecutionException: Worker has already been shutdown
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.registerTask(AbstractNioSelector.java:120)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.execute(DefaultChannelPipeline.java:636)
at org.elasticsearch.common.netty.channel.Channels.fireExceptionCaughtLater(Channels.java:496)
at org.elasticsearch.common.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:46)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.notifyHandlerException(DefaultChannelPipeline.java:658)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:781)
at org.elasticsearch.common.netty.channel.Channels.write(Channels.java:725)
at org.elasticsearch.common.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
at org.elasticsearch.common.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:784)
at org.elasticsearch.http.netty.pipelining.HttpPipeliningHandler.handleDownstream(HttpPipeliningHandler.java:87)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
at org.elasticsearch.http.netty.NettyHttpChannel.sendResponse(NettyHttpChannel.java:199)
at org.elasticsearch.rest.action.support.RestResponseListener.processResponse(RestResponseListener.java:43)
at org.elasticsearch.rest.action.support.RestActionListener.onResponse(RestActionListener.java:49)

To me happen the same thing, i've the elasticsearch 2.0 version

[2015-11-21 01:08:45,418][WARN ][netty.channel.DefaultChannelPipeline] An exception was thrown by an exception handler.
java.util.concurrent.RejectedExecutionException: Worker has already been shutdown
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.registerTask(AbstractNioSelector.java:120)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72)
at org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56)
at org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
at org.jboss.netty.channel.socket.nio.AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34)
at org.jboss.netty.channel.DefaultChannelPipeline.execute(DefaultChannelPipeline.java:636)
at org.jboss.netty.channel.Channels.fireExceptionCaughtLater(Channels.java:496)
at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:46)
at org.jboss.netty.channel.DefaultChannelPipeline.notifyHandlerException(DefaultChannelPipeline.java:658)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:781)
at org.jboss.netty.channel.Channels.write(Channels.java:725)
at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:784)
at org.elasticsearch.http.netty.pipelining.HttpPipeliningHandler.handleDownstream(HttpPipeliningHandler.java:87)
at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
at org.elasticsearch.http.netty.NettyHttpChannel.sendResponse(NettyHttpChannel.java:174)
at org.elasticsearch.rest.action.support.RestResponseListener.processResponse(RestResponseListener.java:43)
at org.elasticsearch.rest.action.support.RestActionListener.onResponse(RestActionListener.java:49)
at org.elasticsearch.action.bulk.TransportBulkAction$2.finishHim(TransportBulkAction.java:353)
at org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:325)
at org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:314)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicationPhase.doFinish(TransportReplicationAction.java:983)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicationPhase.doRun(TransportReplicationAction.java:836)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.finishAndMoveToReplication(TransportReplicationAction.java:530)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.performOnPrimary(TransportReplicationAction.java:608)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$1.doRun(TransportReplicationAction.java:452)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Resolved, i was a problem with the elasticsearch mapping and the reverse proxy.

1 Like

Hi, how did u fix this particular problem by reverse porxy and mapping it? was it was quick fix in the yml for es?