Es_rejected_execution - elasticsearch not indexing or processing data


(Marius) #1

Hello,

we are using elasticsearch to create daily indices of our logfiles. This means every night at 00:00 UTC all new indices are being created for the day.

Since two days we are having issues in one of our clusters at exactly that point in time.
Some new indices are created, but they have no data in them. The rest of the indices are not being created at all. Also the pending task queue is growing significantly with not a lot of tasks being processed. Cluster health is green the whole time, with 0 shards being relocated, initialized, unassigned or delayed.

The first day I was able to get ES processing data again by executing:
curl -XPUT localhost:19210/_cluster/settings -d '{ "transient" : { "threadpool.bulk.queue_size" : 1000 } }'

After that all new indices were being created, as well as new data was flowing in again.
The next day the exact same problem occurred again. I ran the same command again, but set the queue size to 1100 and it fixed the issues again.

However I am not sure if this is really related to the command, or it simply started working again because the command flushed the task queue. I also tried performing rolling restarts of the master nodes to flush the task queue. However when it flushed the queue then, it wasn't helping.

So either it is related to the bulk queue size, or I got lucky two times flushing the queue at the right point in time.

ES Version: 2.4.1
Logstash Versions: 5.3.0, 2.4.1 running in parallel at the moment

Additional log infos:

Very high number of pending tasks with:
"tasks" : [ { "insert_order" : 183826, "priority" : "URGENT", "source" : "create-index-template [metricbeat], cause [api]", "executing" : true, "time_in_queue_millis" : 5655, "time_in_queue" : "5.6s" }, { "insert_order" : 183830,
and

{ "insert_order" : 183939, "priority" : "HIGH", "source" : "_add_listener_", "executing" : false, "time_in_queue_millis" : 108, "time_in_queue" : "108ms" }

Logstash logs filled with:
[2017-05-27T12:38:44,365][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of org.elasticsearch.transport.TransportService$4@7cdd3880 on EsThreadPoolExecutor[bulk, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@562cd8c1[Running, pool size = 32, active threads = 32, queued tasks = 1852, completed tasks = 8490655]]"})

Can you help me out here? It looks like increasing the value was only a short fix and the problem will reoccur again.

Best Regards!


(Mark Walkom) #2

How many nodes, shards, how much data?


(Marius) #3
  • 16 Nodes
  • 6 Data nodes
  • 10609 Active primary shards
  • 31827 Active shards
  • about 6 TB of data

(Mark Walkom) #4

That's waaaaaayyy too many shards and likely causing your issues.


(Marius) #5

As I described we are creating daily indices for storing our logs. The index sizes range from a few mb up to 20gb daily size. Do you recommend switching to weekly indices to decrease the number?

We also have 2 shards and 2 replicas configured.


(David Pilato) #6

1 shard is probably enough. Then you can think of using the rollover API so you will maximize the number of docs per shard.


(Christian Dahlqvist) #7

I would recommend aiming for shard sizes between a few GB and a few tens of GB in size as a general rule of thumb. For indices with low volumes of data you should therefore consider either consolidating them and/or switching to weekly or even monthly indices.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.