Pending tasks queue

ankush_goswami · September 30, 2016, 10:29am

Hi Team,

Below os my cluster health and I am not able to ingest any logs into ES because as soon as Logstash consumer starts to use Auto bulk api of elasticsearch. Documents are not getting indexed and my pending tasks queue is piling up. I am not able to make my cluster stable.

curl localhost:9200/_cluster/health?pretty
{
  "cluster_name" : "galaxy",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 10,
  "number_of_data_nodes" : 6,
  "active_primary_shards" : 33342,
  "active_shards" : 53879,
  "relocating_shards" : 0,
  "initializing_shards" : 4,
  "unassigned_shards" : 12801,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 3334,
  "number_of_in_flight_fetch" : 0
}

pending tasks on my master

{"mon":{"what":"pending_tasks","queue_size":3490,"HIGH":3442,"NORMAL":44,"URGENT":4,"master":{"pri":"NORMAL","cnt":44,"avgT":3.30664e+06,"minT":2345897,"minInOrder":6060},"shard-started":{"pri":"URGENT","cnt":4,"avgT":13820.5,"minT":13687,"minInOrder":10473},"_add_listener_":{"pri":"HIGH","cnt":9,"avgT":3.7101e+06,"minT":2583947,"minInOrder":6077},"refresh-mapping":{"pri":"HIGH","cnt":3432,"avgT":1.89575e+06,"minT":174,"minInOrder":10001},"cluster_reroute":{"pri":"HIGH","cnt":1,"avgT":3859580,"minT":3859580,"minInOrder":6035}}}

Can you please help me to get my cluster stable.

nik9000 · October 1, 2016, 1:50pm

What version of Elasticsearch is this? 1.x? If so the answer is simple: upgrade to 2.x. You have a ton shards in the cluster. In 2.0 Elasticsearch learned how to send the cluster state updates out by diff-ing the new and old cluster state. Before that it sent the whole cluster state out after every action.

Reducing the number of shards will certainly help. Things like creating new indexes with fewer shards and deleting old indexes are the way to go for this.

warkolm · October 1, 2016, 3:23pm

Nik's being nice
You have a ridiculously massive amount of shards, and that is the cause of the problem. Upgrade as he mentions, reduce your shard count too.

ankush_goswami · October 1, 2016, 6:09pm

Thanks @warkolm @nik9000 I have been doing cleaning activity since past 2 days and old indices are almost cleared.

I am using "version" : {
"number" : "1.7.2" }

My current cluster state is:--

{
"cluster_name" : "galaxy",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 50,
"number_of_data_nodes" : 6,
"active_primary_shards" : 9352,
"active_shards" : 17151,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 1553,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 1,
"number_of_in_flight_fetch" : 0
}

Are the shards fine now. I will also upgrade my version to 2.x but this will introduce bump of my logstash version too.

For new indices I will reduce the primary shards to 3 and replicas to 1. Can you please guide me through how many shards should be there on data nodes:--

e.g. I have 6 data nodes of r3.2xlarge aws. How many shards a particular node can accomodate

nik9000 · October 1, 2016, 7:16pm

Tip: wrap code-like stuff in ``` so it is easier to read.

In 2.x we don't really know "how many shard copies is too many". In 1.7 I'd say about 2000 is pushing it, especially given the number of nodes you need to effectively run that many shard copies. We also don't have a hard and fast rule for the number of shards a node can run. It depends on the size and what features you enable. And what version as well. Newer versions are almost always better on all combinations of features though.

Good work slashing the size down! 2.x probably would have been much more stable, but, yeah, that means a major upgrade.

ankush_goswami · October 2, 2016, 5:33am

Thanks @nik9000 for being nice and thanks @warkolm too. I have reduced the shards to now .

  "cluster_name" : "galaxy",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 56,
  "number_of_data_nodes" : 6,
  "active_primary_shards" : 2254,
  "active_shards" : 4219,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 289,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0
}```

Cluster seems to be stable now. I will also go for the upgrade even if it is still a pain as broadcasting whole cluster state with these many shards is too harmful. Thanks all for your help

warkolm · October 2, 2016, 5:44am

Great stuff!

ankush_goswami · October 4, 2016, 1:04pm

Thanks @warkolm. Now I am facing another issue with my cluster. I can see ingestion rate on my cluster is good as it is half of the publish rate in my rabbitmq but my queue size is around 80M. I can also see that my replicas are in unassigned state since long and they are not initializing due to which when kibana is querying shards I am getting SearchPhraseException.

Aside to this Logs are not getting indexed properly. Can you guys please help if this is due to Dynamic Mapping which is causing issues with indexing as dynamic mapping is used as of now.

xxxx-nginx-logs-2016.10.04 1 r UNASSIGNED
xxxx-nginx-logs-2016.10.04 4 r UNASSIGNED
xxxx-nginx-logs-2016.10.03 3 r UNASSIGNED
xxxx-nginx-error-logs-2016.10.04 2 r UNASSIGNED

  "cluster_name" : "galaxy",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 34,
  "number_of_data_nodes" : 6,
  "active_primary_shards" : 2752,
  "active_shards" : 5207,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 297,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0
}```

Topic		Replies	Views
Cluster pending_tasks - what do they mean? Elasticsearch	3	6577	July 5, 2017
Killing pending tasks Elasticsearch	4	1723	November 26, 2021
Cluster status turn yellow everyday morning 8:00 Elasticsearch	18	602	April 21, 2021
Elasticsearch has accumulated a lot of pending tasks, and stop indexing Elasticsearch	13	858	October 4, 2023
Elasticsearch 6.8.6 pending_tasks grows after 24 hours Elasticsearch	13	814	May 1, 2020

Pending tasks queue

Related topics