Cluster pending_tasks - what do they mean?

vnagendra · April 26, 2016, 7:00pm

We have a cluster with a high number of pending tasks. An example below

 {
    "insert_order" : 17373233,
    "priority" : "NORMAL",
    "source" : "indices_store ([[logstash-CUSTOMER-PRODUCT-2016.03.18][2]] active fully on other nodes)",
    "executing" : false,
    "time_in_queue_millis" : 1209415567,
    "time_in_queue" : "13.9d"
  },

When we did the following query curl -XGET 'http://localhost:9200/_cluster/pending_tasks?pretty=true' > pending_tasks.json, we got back about 251 MB of data!!

total 515160
drwxr-xr-x  5 vasu  staff   170B Apr 26 13:39 .
drwxr-xr-x  7 vasu  staff   238B Apr 26 13:34 ..
-rw-r--r--  1 vasu  staff    77K Apr 26 13:38 nodes.json
-rw-r--r--  1 vasu  staff   5.9K Apr 26 13:39 nodes_abbreviated.json
-rw-r--r--  1 vasu  staff   251M Apr 26 13:37 pending_tasks.json

According to the API docs, these should be generally ZERO and in "rare cases" when master is the bottleneck, it can be high - but only for a few thousand milliseconds. We are seeing DAYS for these values.

There is obviously something wrong (perhaps). Any suggestions on how to go about understanding what is actually wrong? Almost all the tasks are of the same form as above "indices_store .... everything is normal..blah".

A few days ago, our masters started having problems and were falling over. That has since been mitigated. These are perhaps old pending tasks that never got deleted? (our current best guess)

Our ES Version is 2.1.2, and the rest of the information is below.

ec2-user@Elasticsearch:~> curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "Elasticsearch",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 12,
  "number_of_data_nodes" : 9,
  "active_primary_shards" : 5023,
  "active_shards" : 10046,
  "relocating_shards" : 2,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 1165671,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 1504202,
  "active_shards_percent_as_number" : 100.0
}

Thanks!

vnagendra · April 26, 2016, 7:33pm

Small update on this. We killed the master (after creating 3 dedicated master nodes). Now the number of tasks "during" the shard allocation ... is as follows

ec2-user@Elasticsearch:~> curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "Elasticsearch",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 12,
  "number_of_data_nodes" : 9,
  "active_primary_shards" : 5023,
  "active_shards" : 9018,
  "relocating_shards" : 0,
  "initializing_shards" : 2,
  "unassigned_shards" : 1026,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 3,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 117141,
  "active_shards_percent_as_number" : 89.76707147123233
}

Number of tasks back to normal numbers than what it was before. If anyone has any ideas on what potentially could've happened previously, that would be awesome.

previously we did NOT have dedicated master nodes, as our throughput to ES is really pretty small. After 4 months, we have shy of 5GB of data in ES.

warkolm · April 27, 2016, 5:38am

That's pretty impressive! But also bad

What sort of data do you have in the cluster? How many queries per second? How large are your nodes? What sort of tasks are in the queue?

Topic		Replies	Views
Elasticsearch pending_tasks Elasticsearch	11	1697	October 29, 2018
Increasing number of pending tasks despite small number of shards Elasticsearch	4	1205	June 23, 2021
Pending tasks queue Elasticsearch	8	3422	July 5, 2017
Stuck pending tasks Elasticsearch	4	2064	July 5, 2017
Elasticsearch cluster have millions of pending tasks Elasticsearch	15	1214	June 8, 2021

Cluster pending_tasks - what do they mean?

Related topics