I have a new three-node Elasticsearch cluster that's giving me some
trouble. After a while of running smoothly (a couple hours, sometimes more,
sometimes less), one of the three nodes gets the following error about 10
times:
[2014-08-12 16:28:05,784][WARN ][http.netty ]
[elasticsearch-01] Caught exception while handling client http traffic,
closing connection [id: 0x05bf114e, /10.10.100.10:45980 =>
/10.10.100.20:9200]
java.lang.IllegalArgumentException: empty text
at
org.elasticsearch.common.netty.handler.codec.http.HttpVersion.(HttpVersion.java:97)
at
org.elasticsearch.common.netty.handler.codec.http.HttpVersion.valueOf(HttpVersion.java:62)
at
org.elasticsearch.common.netty.handler.codec.http.HttpRequestDecoder.createMessage(HttpRequestDecoder.java:75)
at
org.elasticsearch.common.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:189)
at
org.elasticsearch.common.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:101)
at
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500)
at
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Shortly after, each of the three nodes encounters the following error
several dozen times:
[2014-08-12 18:10:11,638][DEBUG][action.search.type ]
[elasticsearch-01] [614] Failed to execute fetch phase
org.elasticsearch.search.SearchContextMissingException: No search context
found for id [614]
at
org.elasticsearch.search.SearchService.findContext(SearchService.java:480)
at
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:450)
at
org.elasticsearch.search.action.SearchServiceTransportAction$17.call(SearchServiceTransportAction.java:410)
at
org.elasticsearch.search.action.SearchServiceTransportAction$17.call(SearchServiceTransportAction.java:407)
at
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Restarting elasticsearch on each of the nodes will temporarily resolve the
issue for another hour or two until it all starts again.
Despite my best efforts, I haven't been able to pinpoint or reliably
reproduce the issue. I have noticed, though, that the first error (which
seems to trigger the second) always occurs on the same node. Furthermore, I
have a single-node cluster which has been supporting identical
traffic/queries on the same data without issue, leading me to believe that
the issue might be related to the cluster configuration.
Does anyone have thoughts about how I can go about
resolving/troubleshooting this?
The first error is from a HTTP connection hat did not send a HTTP command
correctly to the cluster node (i.e. without HTTP version). Also possible is
a connection attempt of a non-HTTP client, or a misconfiguration.
The second error is probably from a search request using scan/scroll, but
the lifetime of the scan/scroll had been exceeded. It could also be a
runaway query.
I have a new three-node Elasticsearch cluster that's giving me some
trouble. After a while of running smoothly (a couple hours, sometimes more,
sometimes less), one of the three nodes gets the following error about 10
times:
[2014-08-12 16:28:05,784][WARN ][http.netty ]
[elasticsearch-01] Caught exception while handling client http traffic,
closing connection [id: 0x05bf114e, /10.10.100.10:45980 => /
10.10.100.20:9200]
java.lang.IllegalArgumentException: empty text
at
org.elasticsearch.common.netty.handler.codec.http.HttpVersion.(HttpVersion.java:97)
at
org.elasticsearch.common.netty.handler.codec.http.HttpVersion.valueOf(HttpVersion.java:62)
at
org.elasticsearch.common.netty.handler.codec.http.HttpRequestDecoder.createMessage(HttpRequestDecoder.java:75)
at
org.elasticsearch.common.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:189)
at
org.elasticsearch.common.netty.handler.codec.http.HttpMessageDecoder.decode(HttpMessageDecoder.java:101)
at
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500)
at
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Shortly after, each of the three nodes encounters the following error
several dozen times:
[2014-08-12 18:10:11,638][DEBUG][action.search.type ]
[elasticsearch-01] [614] Failed to execute fetch phase
org.elasticsearch.search.SearchContextMissingException: No search context
found for id [614]
at
org.elasticsearch.search.SearchService.findContext(SearchService.java:480)
at
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:450)
at
org.elasticsearch.search.action.SearchServiceTransportAction$17.call(SearchServiceTransportAction.java:410)
at
org.elasticsearch.search.action.SearchServiceTransportAction$17.call(SearchServiceTransportAction.java:407)
at
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Restarting elasticsearch on each of the nodes will temporarily resolve the
issue for another hour or two until it all starts again.
Despite my best efforts, I haven't been able to pinpoint or reliably
reproduce the issue. I have noticed, though, that the first error (which
seems to trigger the second) always occurs on the same node. Furthermore, I
have a single-node cluster which has been supporting identical
traffic/queries on the same data without issue, leading me to believe that
the issue might be related to the cluster configuration.
Does anyone have thoughts about how I can go about
resolving/troubleshooting this?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.