Cluster state displays disconnected nodes


(sooyeon-2) #1

Hi,

I see a lot of NodeNotConnectedException in the master log, and it seems
like the master is so busy pinging nonexistent nodes that it doesn't have
the capacity to maintain the correct cluster state.
I read a previous post about making sure embedded nodes are closed properly
before closing the application so I made that change for the future but the
cluster's been pinging these nodes for hours now. Is there a way to force
those nodes to be ignored?

Issuing a shudown request prints out the ghost nodes but they don't go way.
curl -XPOST
'http://localhost:9200/_cluster/nodes/10.100.0.64/_shutdown?pretty=true'
{
"cluster_name" : "es_cluster",
"nodes" : {
"lRtT5BClR720zt_gJzuniA" : {
"name" : "Warstrike"
},
"9lnjZX57R7-D5lP5n5V2sg" : {
"name" : "Windshear"
},
"sIwWBvXQTla907qz8XXSdQ" : {
"name" : "Crichton, Kenneth"
}
}
}

Thanks!

FYI, here's the stacktrace:
[19:33:47,666][DEBUG][action.admin.cluster.node.stats]
[prod-es-002-1-master] failed to execute on node [9lnjZX57R7-D5lP5n5V2sg]
org.elasticsearch.transport.SendRequestTransportException:
[Windshear][inet[prod-es-001/10.100.0.64:9300]][cluster/nodes/stats/n]
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:200)
at
org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.start(TransportNodesOperationAction.java:172)
at
org.elasticsearch.action.support.nodes.TransportNodesOperationAction$AsyncAction.access$300(TransportNodesOperationAction.java:102)
at
org.elasticsearch.action.support.nodes.TransportNodesOperationAction.doExecute(TransportNodesOperationAction.java:73)
at
org.elasticsearch.action.support.nodes.TransportNodesOperationAction.doExecute(TransportNodesOperationAction.java:43)
at
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:61)
at
org.elasticsearch.client.node.NodeClusterAdminClient.execute(NodeClusterAdminClient.java:70)
at
org.elasticsearch.client.support.AbstractClusterAdminClient.nodesStats(AbstractClusterAdminClient.java:150)
at
org.elasticsearch.rest.action.admin.cluster.node.stats.RestNodesStatsAction.executeNodeStats(RestNodesStatsAction.java:130)
at
org.elasticsearch.rest.action.admin.cluster.node.stats.RestNodesStatsAction.handleRequest(RestNodesStatsAction.java:125)
at
org.elasticsearch.rest.RestController.executeHandler(RestController.java:159)
at
org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142)
at
org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:120)
at
org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:82)
at
org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:259)
at
org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:558)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:777)
at
org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:111)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:558)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:777)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.unfoldAndFireMessageReceived(ReplayingDecoder.java:522)
at
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:501)
at
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:438)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:558)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:777)
at
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:558)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:553)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:343)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:274)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:194)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[Windshear][inet[prod-es-001/10.100.0.64:9300]] Node not connected
at
org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:637)
at
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:445)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:185)
... 42 more


(system) #2