So we seem to be having recurring incidences of ES nodes getting into
a very odd state. In this particular case, one node because
unresponsive to test polls. I'm not really sure what to make of this,
because while this is ongoing, the cluster remains green, but the
borked node continues to try and service traffic, which means our app
is sporadically failing in the meantime.
[UTC Feb 5 22:29:23] error : 'prod_elasticsearch_cluster_health'
failed protocol test [HTTP] at INET[10.180.48.216:9200/_cluster/
health] via TCP -- HTTP: Error receiving data -- Resource temporarily
unavailable
[UTC Feb 5 22:30:28] error : 'prod_elasticsearch_cluster_health'
failed protocol test [HTTP] at INET[10.180.48.216:9200/_cluster/
health] via TCP -- HTTP: Error receiving data -- Resource temporarily
unavailable
[UTC Feb 5 22:31:33] error : 'prod_elasticsearch_cluster_health'
failed protocol test [HTTP] at INET[10.180.48.216:9200/_cluster/
health] via TCP -- HTTP: Error receiving data -- Resource temporarily
unavailable
Here's the corresponding logs from the node in question:
Some of this:
[2012-02-05 22:32:11,552][INFO ][discovery.zen ] [prod-es-
r07] master_left [[prod-es-r08][mlrGPzm3QeCm7d_E_Lvozg][inet[prod-es-
r08.ihost.brewster.com/10.180.48.255:9300]]], reason [no longer
master]
[2012-02-05 22:32:11,552][INFO ][cluster.service ] [prod-es-
r07] master {new [prod-es-r04][uOUyy7p_TBuNEbwmqWF9-w][inet[prod-es-
r04.ihost.brewster.com/10.180.35.110:9300]], previous [prod-es-r08]
[mlrGPzm3QeCm7d_E_Lvozg][inet[prod-es-r08.ihost.brewster.com/
10.180.48.255:9300]]}, removed {[prod-es-r08][mlrGPzm3QeCm7d_E_Lvozg]
[inet[prod-es-r08.ihost.brewster.com/10.180.48.255:9300]],}, reason:
zen-disco-master_failed ([prod-es-r08][mlrGPzm3QeCm7d_E_Lvozg]
[inet[prod-es-r08.ihost.brewster.com/10.180.48.255:9300]])
[2012-02-05 22:32:12,557][INFO ][discovery.zen ] [prod-es-
r07] master_left [[prod-es-r04][uOUyy7p_TBuNEbwmqWF9-w][inet[prod-es-
r04.ihost.brewster.com/10.180.35.110:9300]]], reason [no longer
master]
[2012-02-05 22:32:12,558][WARN ][discovery.zen ] [prod-es-
r07] not enough master nodes after master left (reason = no longer
master), current nodes: {[prod-es-r02][uuh4KmeHR-eUeIr7J97zCg]
[inet[prod-es-r02.ihost.brewster.com/10.182.14.95:9300]],[prod-es-r07]
[zqJRs5e6S5eWfL0kVuolJg][inet[prod-es-r07.ihost.brewster.com/
10.180.48.216:9300]],}
[2012-02-05 22:32:12,559][INFO ][cluster.service ] [prod-es-
r07] removed {[prod-es-r02][uuh4KmeHR-eUeIr7J97zCg][inet[prod-es-
r02.ihost.brewster.com/10.182.14.95:9300]],[prod-es-r04]
[uOUyy7p_TBuNEbwmqWF9-w][inet[prod-es-r04.ihost.brewster.com/
10.180.35.110:9300]],}, reason: zen-disco-master_failed ([prod-es-r04]
[uOUyy7p_TBuNEbwmqWF9-w][inet[prod-es-r04.ihost.brewster.com/
10.180.35.110:9300]])
[2012-02-05 22:32:12,565][WARN ][http.netty ] [prod-es-
r07] Caught exception while handling client http traffic, closing
connection [id: 0x09be0e53, /10.180.48.216:54645 => /
10.180.48.216:9200]
java.lang.IllegalArgumentException: empty text
at
org.elasticsearch.common.netty.handler.codec.http.HttpVersion.(HttpVersion.java:
103)
at
org.elasticsearch.common.netty.handler.codec.http.HttpVersion.valueOf(HttpVersion.java:
68)
at
org.elasticsearch.common.netty.handler.codec.http.HttpRequestDecoder.createMessage(HttpRequestDecoder.java:
81)
at
org.elasticsearch.common.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:
198)
at
org.elasticsearch.common.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:
107)
at
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:
470)
at
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:
443)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:
80)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:
564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline
$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:
783)
at
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:
81)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:
564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:
559)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:
274)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:
261)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:
351)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:
282)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:
202)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:
108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker
$1.run(DeadLockProofWorker.java:44)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2012-02-05 22:32:12,814][DEBUG][action.search.type ] [prod-es-
r07] Node [Bbaoza_KTP2DJQgxM4JN-A] not available for scroll request
[scan;1;5092799:Bbaoza_KTP2DJQgxM4JN-A;1;total_hits:7200;]
[2012-02-05 22:32:12,815][DEBUG][action.search.type ] [prod-es-
r07] Node [Bbaoza_KTP2DJQgxM4JN-A] not available for scroll request
[scan;1;5092799:Bbaoza_KTP2DJQgxM4JN-A;1;total_hits:7200;]
[2012-02-05 22:32:14,066][WARN ][http.netty ] [prod-es-
r07] Caught exception while handling client http traffic, closing
connection [id: 0x2cb594cf, /10.180.48.216:54651 => /
10.180.48.216:9200]
Followed by tons of this:
[2012-02-05 22:32:24,572][DEBUG][action.search.type ] [prod-es-
r07] [contact_documents-33-0][0], node[ar6qMqYnRSm5f0zvpKDirA], [R],
s[STARTED]: Failed to execute
[org.elasticsearch.action.search.SearchRequest@5c67fa3d]
org.elasticsearch.transport.SendRequestTransportException: [prod-es-
r06][inet[prod-es-r06.ihost.brewster.com/10.180.46.203:9300]][search/
phase/query]
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:
196)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:
168)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:
140)
at org.elasticsearch.action.search.type.TransportSearchCountAction
$AsyncAction.sendExecuteFirstPhase(TransportSearchCountAction.java:74)
at org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:205)
at org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:
279)
at org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction$3.onFailure(TransportSearchTypeAction.java:211)
at org.elasticsearch.search.action.SearchServiceTransportAction
$2.handleException(SearchServiceTransportAction.java:151)
at org.elasticsearch.transport.TransportService
$2.run(TransportService.java:199)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[prod-es-r06][inet[prod-es-r06.ihost.brewster.com/10.180.46.203:9300]]
Node not connected
at
org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:
636)
at
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:
448)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:
181)
... 11 more
[2012-02-05 22:32:24,572][DEBUG][action.search.type ] [prod-es-
r07] [contact_documents-4-0][0], node[ar6qMqYnRSm5f0zvpKDirA], [P],
s[STARTED]: Failed to execute
[org.elasticsearch.action.search.SearchRequest@718585ec]
org.elasticsearch.transport.SendRequestTransportException: [prod-es-
r06][inet[prod-es-r06.ihost.brewster.com/10.180.46.203:9300]][search/
phase/query]
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:
196)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:
168)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:
140)
at org.elasticsearch.action.search.type.TransportSearchCountAction
$AsyncAction.sendExecuteFirstPhase(TransportSearchCountAction.java:74)
at org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:205)
at org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:
279)
at org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction$3.onFailure(TransportSearchTypeAction.java:211)
at org.elasticsearch.search.action.SearchServiceTransportAction
$2.handleException(SearchServiceTransportAction.java:151)
at org.elasticsearch.transport.TransportService
$2.run(TransportService.java:199)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[prod-es-r06][inet[prod-es-r06.ihost.brewster.com/10.180.46.203:9300]]
Node not connected
at
org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:
636)
at
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:
448)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:
181)
... 11 more
etc