Increasing number of pending tasks despite small number of shards

crl · May 20, 2021, 3:58pm

Hi

I'm fairly new to Elasticsearch and I'm trying to maintain a small cluster.
Currently I'm having trouble with a growing number of pending tasks. In all other threads that I have looked at the issue have been caused by having a large amount of shards.. However I'm quite sure that is not the case here.. Here is the output from /_cluster/health

{
  "status": "yellow",
  "number_of_nodes": 3,
  "unassigned_shards": 3,
  "number_of_pending_tasks": 2254765,
  "number_of_in_flight_fetch": 0,
  "timed_out": false,
  "active_primary_shards": 218,
  "task_max_waiting_in_queue_millis": 43353576,
  "relocating_shards": 0,
  "active_shards_percent_as_number": 98.66071428571429,
  "active_shards": 221,
  "initializing_shards": 0,
  "number_of_data_nodes": 2,
  "delayed_unassigned_shards": 0
}

We are running without replicas except for a few select Elasticsearch system indices. The ILM is set to 20 GB or 30 days, with deletion after 60 days.

From what I understand this should not in any way be able to cause the issue that we are seeing.
The status is yellow because the periodic snapshot is failing. This might be the cause or at least have the same root cause.

The snapshot supposedly fails because there are 279 shard failures (primary shard is not allocated), but from what I can see this is not true..

We did do a downgrade of the "hot" node around the time when the snapshot issue started. Some days later the node restarted because it ran out of memory at which point the shards were in fact unavailable.

We tried to upgrade the node again, but it failed, however every time we tried the number of unassigned shards went down, according to the overview, but not the snapshot menu.
(Side question I couldn't find a way to get the shards assigned without attempting to change the config, can anyone tell me how I could have done it?)

It seems that the cluster has ended up in a weird state where everything seems to be working except for the snapshot the growing number of tasks..

Please help me learn and figure out how to fix this.

crl · May 26, 2021, 7:57am

I had to restart the cluster yesterday because the number of tasks had increased to more than 10M over the weekend and one of the nodes were running out of memory. Naturally this cleared the pending tasks but now, less than a day later, it is at 2.5M again.
I can't figure out the what task is blocking, but it might be a template creation. At least the ES log is being spammed with this message multiple times a second:
[instance-0000000026] adding template [MyIndex] for index patterns [MyIndex-*]

Christian_Dahlqvist · May 26, 2021, 8:41am

Which version of Elasticsearch are you using? What is the specification of the cluster with respect to hardware? What type of storage are you using? Local SSDs?

crl · May 26, 2021, 9:16am

We are using Elasticsearch 7.5 and running in the cloud with 3 nodes (2 data nodes) all in the same zone:

aws.coordinating.m5 - up to 4 vCPU - 8 GB RAM - 32 GB disk
aws.data.highio.i3 - 3.8 vCPU - 29 GB RAM - 870 GB disk (shows as 928 GB with 370 GB used)
aws.data.highstorage.d2 - 3.8 vCPU - 29 GB RAM - 4.53 TB disk (shows as 4.59 TB with 1.4 TB used)

system · June 23, 2021, 9:17am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster pending_tasks - what do they mean? Elasticsearch	3	6673	July 5, 2017
Elasticsearch pending_tasks Elasticsearch	11	1715	October 29, 2018
Pending tasks queue Elasticsearch	8	3427	July 5, 2017
Elasticsearch cluster have millions of pending tasks Elasticsearch	15	1222	June 8, 2021
Number of pending tasks grows infinitely after cluster crash Elasticsearch	6	3029	January 2, 2017

Increasing number of pending tasks despite small number of shards

Related topics