Elasticsearch 7.7 - all shards failed and task cancelled with channel closed

Hello all,

I set up a 3 node cluster using the included JRE on Windows 2016 servers with 12 GB RAM (java min/max 6GB) which seemed to be running fine without a lot of traffic. After hooking it up to a search GUI I started seeing this error:

    [2021-01-04T01:24:26,346][WARN ][o.e.m.j.JvmGcMonitorService] [node1] [gc][381772] overhead, spent [7s] collecting in the last [7.3s]
    [2021-01-04T01:24:26,153][WARN ][r.suppressed             ] [node1] path: /alias1,alias2/_search, params: {index=alias1,alias2}
    org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
    	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:551) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:309) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.action.search.FetchSearchPhase.moveToNextPhase(FetchSearchPhase.java:231) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.action.search.FetchSearchPhase.lambda$innerRun$1(FetchSearchPhase.java:119) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.action.search.CountedCollector.countDown(CountedCollector.java:53) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.action.search.CountedCollector.onFailure(CountedCollector.java:76) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.action.search.FetchSearchPhase$2.onFailure(FetchSearchPhase.java:198) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:402) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1139) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1248) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1222) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.transport.TransportService.sendLocalRequest(TransportService.java:795) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.transport.TransportService.access$100(TransportService.java:75) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.transport.TransportService$3.sendRequest(TransportService.java:128) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:707) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:621) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.transport.TransportService.sendChildRequest(TransportService.java:672) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.transport.TransportService.sendChildRequest(TransportService.java:664) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.action.search.SearchTransportService.sendExecuteFetch(SearchTransportService.java:172) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.action.search.SearchTransportService.sendExecuteFetch(SearchTransportService.java:162) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.action.search.FetchSearchPhase.executeFetch(FetchSearchPhase.java:180) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:162) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.action.search.FetchSearchPhase.access$000(FetchSearchPhase.java:47) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:95) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.7.1.jar:7.7.1]
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
    	at java.lang.Thread.run(Thread.java:832) [?:?]
    Caused by: org.elasticsearch.ElasticsearchException$1: Task cancelled before it started: channel closed
    	at org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:644) ~[elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:307) [elasticsearch-7.7.1.jar:7.7.1]
    	... 30 more
    Caused by: java.lang.IllegalStateException: Task cancelled before it started: channel closed
    	at org.elasticsearch.tasks.TaskManager.registerCancellableTask(TaskManager.java:141) ~[elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.tasks.TaskManager.register(TaskManager.java:122) ~[elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:60) ~[elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:264) ~[elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:225) ~[elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:186) ~[elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:119) ~[elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:103) ~[elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:676) ~[elasticsearch-7.7.1.jar:7.7.1]
    	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) ~[?:?]
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) ~[?:?]
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) ~[?:?]
    	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) ~[?:?]
    	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321) ~[?:?]
    	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295) ~[?:?]
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) ~[?:?]
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) ~[?:?]
    	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) ~[?:?]
    	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:227) ~[?:?]
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) ~[?:?]
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) ~[?:?]
    	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) ~[?:?]
    	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[?:?]
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) ~[?:?]
    	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) ~[?:?]
    	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[?:?]
    	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[?:?]
    	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) ~[?:?]
    	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) ~[?:?]
    	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) ~[?:?]
    	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) ~[?:?]
    	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[?:?]
    	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
    	at java.lang.Thread.run(Thread.java:832) ~[?:?]

I've seen some other posts with this error but I have not seen a solution.

Anyone have any ideas?

Thanks!

Jim

Welcome to our community! :smiley:

What GUI is this?

Do those aliases/indices mentioned exist?

Hi @jamesp,

Did you cancel an ongoing search request? Or the connection between your search GUI and Elasticsearch got disconnected?

Mark,

Thanks for the reply. We are using a custom built javascript GUI. Those aliases are set up in the cluster and map back to indices with documents. I note that I have seen this error multiple times and have increased the RAM on the servers from 4 to 6 GB but still get the error, also it seems the system recovers and continues to return results. The cluster only contains about 1-2 millions documents at the moment.

Jim

Nhat,

Thanks for the reply, I seen a drop in connection and I've tried to replicate one but have not succeeded.

Jim

I will add that I moved from version 6.2.3 to 7.7.1. The two indexes linked to the aliases listed were indexed via reindex from a remote cluster, I created a new empty index, then indexed the remote data into each newly created index in the 7.7 cluster.

We are also using readonlyrest, I had some issues getting the basic X-Pack package to work.

Jim

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.