Es_rejected_execution exception with index failure changing

Hello team,

After restarting the Elasticsearch service several time, and increasing the thread_pool.bulk.queue_size, when I try to interact with Elasticsearch with the following query :

q = {
  "query": {
    "match_all": {}

q ='my_index', body=q)

I get the following message :

{'took': 38,
 'timed_out': False,
'num_reduce_phases': 3,
'_shards': {'total': 1396,
 'successful': 1395,
 'failed': 1,
 'failures': [{'shard': 1,
'index': 'SOME_OTHER_INDEX',
'node': 'My_node',
'reason': {'type': 'es_rejected_execution_exception',
 'reason': 'rejected execution of org.elasticsearch.transport.TransportService$7@37f5c738 on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@5c51cea6[Running, pool size = 13, active threads = 13, queued tasks = 996, completed tasks = 4347]]'}}]},
 'hits': {'total': 0, 'max_score': None, 'hits': []}} 

And when I relaunch my query, I get a failure on another different index. I tried to check the cluster's health but everything seems normal as you can see :

curl localhost:9201/_cluster/health?pretty
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                             Dload  Upload   Total   Spent    Left  Speed
100   472  100   472    0     0  10260      0 --:--:-- --:--:-- --:--:-- 10260{
 "cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 1396,
"active_shards" : 1396,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0

Any thouyghts on why this is happening?


You have far too many shards. Elasticsearch creates a task for each shard to be searched, so that's 1396 search tasks (plus some for coordination). In order to protect the cluster, Elasticsearch rejects attempts to make so many tasks. The solution is to reduce the number of shards in your cluster. Here is an article on this subject:

I have hundreds of indices, is it possible to have indices sharing one shard ? And once I know that I must shrink indices, is it possible to shrink several of them at once ? The Shrink index page seems to show that it can only be done one by one.

Shrinking an index can be used to combine shards within that index, but not across indices, because different indices may have different mappings and may contain multiple documents with the same ID.

If you want to combine some of your indices together, I think the best way forward would be to reindex them.

I have exactly 0 intention of combining indices, I want to know how I am supposed to deal with the fact that, according to you, I have too many shards but in the meantime I cannot shrink them accross indices. What would be a good range for a number of shards per cluster? How can I increase the queue capacity so that my elasticsearch will be functioning again?

This is covered in the article to which I linked, but here's some choice quotes:

Aim to keep the average shard size between at least a few GB and a few tens of GB. For use-cases with time-based data, it is common to see shards between 20GB and 40GB in size.

A good rule-of-thumb is to ensure you keep the number of shards per node below 20 per GB heap it has configured.

The queue capacity seems correct to me, it's the shard count that needs work.

I will modify my structure to fit the best practices, thank you for your time.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.