Rejected execution (queue capacity 1000)

elbori76 · June 19, 2017, 3:14pm

Receiving the following error in elasticsearch cluster logs
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler

Was able to notice the following during one of the times a search query was hung
http://node01.testing.local:9200/_nodes/node01/stats/thread_pool?human&pretty
"search": {
"threads": 24,
"queue": 1000,
"active": 24,
"rejected": 300,
"largest": 24,
"completed": 141910

Cluster Information
We have a 11 node ELK cluster in which 3 master nodes, 5 data nodes and 3 logstash nodes.

Data nodes are configured as followed

8vCPU
64GB RAM (32 allocated to ES_HEAP_SIZE)
1.4TB iSCSI Volume with 700GBs used
Swap is disabled.

polyfractal · June 20, 2017, 9:32pm

Well, it basically means that you've got 1000 search requests that have queued up waiting to run, and once the limit is reached ES just starts aborting new requests.

So you'll need to figure out the bottleneck. Some options:

Your clients are simply sending too many queries too quickly in a fast burst, overwhelming the queue. You can monitor this with Node Stats over time to see if it's bursty or smooth
You've got some very slow queries which get "stuck" for a long time, eating up threads and causing the queue to back up. You can enable the slow log to see if there are queries that are taking an exceptionally long time, then try to tune those
There may potentially be "unending" scripts written in Groovy or something. E.g. a loop that never exits, causing the thread to spin forever.
Your hardware may be under-provisioned for your workload, and bottlenecking on some resource (disk, cpu, etc)
A temporary hiccup from your iSCSI target, which causes all the in-flight operations to block waiting for the disks to come back. It wouldn't take a big latency hiccup to seriously backup a busy cluster... ES generally expects disks to always be available.
Heavy garbage collections could cause problems too. Check Node Stats to see if there are many/long old gen GCs running

system · July 18, 2017, 9:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Queue capacity Elasticsearch	4	832	July 6, 2017
Rejected execution Elasticsearch	9	6574	August 11, 2020
Rejected execution during search Elasticsearch	2	3155	June 4, 2018
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.TransportService Elasticsearch	25	8496	February 27, 2018
EsRejectedExecutionException: rejected execution (queue capacity 1000) Elasticsearch	7	4591	July 6, 2017

Rejected execution (queue capacity 1000)

Related topics