Stuck pending tasks

(Jason) #1

About 10 minutes after setting cluster.routing.allocation.disk.threshold_enabled to true, over 4,000 tasks (see below for example) got queued up and they seem to be stuck. About once every 30 minutes the top 1 or 2 tasks would be removed.

Any idea why these tasks are queued up?


Cluster Status:

cluster_name: "search",
status: "yellow",
timed_out: false,
number_of_nodes: 6,
number_of_data_nodes: 6,
active_primary_shards: 54110,
active_shards: 162326,
relocating_shards: 500,
initializing_shards: 0,
unassigned_shards: 4

Task queued up:

insert_order: 204469,
priority: "URGENT",
source: "shard-started ([c52352ec-7d62-436e-9f74-3e4d3aa0a344][4], node[_kuf5-IYQLKA3YXo9fs_DQ], relocating [XaH-gPDTQRWjdP09dfIuqw], [P], s[INITIALIZING]), reason [after recovery (replica) from node [[Some_Node][XaH-gPDTQRWjdPZJauIuqw][inet[/]]]]",
time_in_queue_millis: 10555125,
time_in_queue: "2.9h"

(Christian Dahlqvist) #2

That is a VERY large number of shards for a 6 node cluster. I would recommend you reconsider your indexing/sharding strategy to reduce the number of indices and shards considerably.

How much data do you have in the cluster? What is your average shard size?

(Jason) #3

@Christian_Dahlqvist: You are absolutely right. We are in the process of re-architecting our data models now. We actually have a lot of empty shards. Our total data size is only 3.2GB for this cluster.

Any idea how we can get the stuck tasks moved forward?

(Mark Walkom) #4

Wow, you just broke the record for the most number of shards I have seen :wink:

(On a cluster that small)

