Courier Fetch x of y shards are failed


(Gaurav Dalvi) #1

Hello All,

I have one node cluster running since last 20 days. Its taking syslogs from 4 devices. It used work fine till yesterday. Now when I refresh my Kibana page, it gives me this error on the top. If I press OK it goes away, basically its warning. Just want to know how can I resolve this problem.

I have
Kibana : 4.3.1
ES : 2.1.1
Logstash : 2.1.1

I searched here and I got possible solution using thread pools.
i.e : threadpool.search.queue_size: 2000

I updated my elasticsearch.yml file with that but because of this I am not able to start ES node / cluster. May be this option is not supported in this ES version.

Thanks,
Gaurav


(Mark Walkom) #2

What do your ES logs say, there should be something explaining the error.


(Gaurav Dalvi) #3

Thanks for showing interest. Here is dump from log file.

[2016-03-11 14:13:35,838][DEBUG][action.search.type ] [gaurav-node] [logstash-2016.02.26][0], node[a7_9Js2SSOe9M3APDASbwQ], [P], v[4], s[STARTED], a[id=UmCpYbDlQVCE5YoO6VeUDg]: Failed to execute [org.elasticsearch.action.search.SearchRequest@3321f4bf] lastShard [true]
RemoteTransportException[[gaurav-node][172.20.203.191:9300][indices:data/read/search[phase/query]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4@14ed14f2 on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@3f06eeb4[Running, pool size = 4, active threads = 4, queued tasks = 1000, completed tasks = 84579]]];
Caused by: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4@14ed14f2 on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@3f06eeb4[Running, pool size = 4, active threads = 4, queued tasks = 1000, completed tasks = 84579]]]
at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:50)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:85)
at org.elasticsearch.transport.TransportService.sendLocalRequest(TransportService.java:346)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:310)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:282)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:142)
at org.elasticsearch.action.search.type.TransportSearchCountAction$AsyncAction.sendExecuteFirstPhase(TransportSearchCountAction.java:72)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:166)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:148)
at org.elasticsearch.action.search.type.TransportSearchCountAction.doExecute(TransportSearchCountAction.java:56)
at org.elasticsearch.action.search.type.TransportSearchCountAction.doExecute(TransportSearchCountAction.java:45)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:107)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:44)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70)
at org.elasticsearch.action.search.TransportMultiSearchAction.doExecute(TransportMultiSearchAction.java:63)
at org.elasticsearch.action.search.TransportMultiSearchAction.doExecute(TransportMultiSearchAction.java:39)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70)
at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:347)
at org.elasticsearch.client.FilterClient.doExecute(FilterClient.java:52)
at org.elasticsearch.rest.BaseRestHandler$HeadersAndContextCopyClient.doExecute(BaseRestHandler.java:83)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:347)
at org.elasticsearch.client.support.AbstractClient.multiSearch(AbstractClient.java:600)
at org.elasticsearch.rest.action.search.RestMultiSearchAction.handleRequest(RestMultiSearchAction.java:74)
at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:54)
at org.elasticsearch.rest.RestController.executeHandler(RestController.java:207)
at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:166)
at org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:128)
at org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:86)
at org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:348)
at org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:63)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)


(Gaurav Dalvi) #4

Error is shown on Kibana UI, But I dont see nay logs are generated for kibana. I have not modified in kibana config file except elasticsearch.url: parameter.


(Mark Walkom) #5

Looks like your ES node is overwhelmed.


(Gaurav Dalvi) #6

Thanks. Do you know what is the solution on this problem ?


(Mark Walkom) #7

Add more nodes or increase the available resources to these nodes.


(Gaurav Dalvi) #8

Appreciate your quick help.

The machine on my one node cluster is running shows this :
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/rhel-root 18307072 5231160 13075912 29% /
devtmpfs 1932128 0 1932128 0% /dev
tmpfs 1941508 0 1941508 0% /dev/shm
tmpfs 1941508 180684 1760824 10% /run
tmpfs 1941508 0 1941508 0% /sys/fs/cgroup
/dev/sda1 508588 123284 385304 25% /boot

So I dont think I am using too much resources here. Do you know how can I avoid this error in this one node cluster. I do understand that adding more nodes will probably solve this problem but I dont think I am using this machine upto its 99% capacity.

Thanks,
Gaurav


(Mark Walkom) #9

Right, but what about CPU, memory, threadpools and all that other stuff?


(system) #10