Update_tsdb_data_stream_end_times task stuck pending

buitcj · November 14, 2024, 8:59pm

I have a pending task named update_tsdb_data_stream_end_times and don't know what it is or what effect it may have. Seems to be a task created from UpdateTimeSeriesRangeService.java. Any idea why this might've gotten stuck? Can this particular task being stuck cause a problem related to my other post Node fails but cluster holds no election and no failover occurs where there were pending tasks up to 24 hours prior to our cluster being unusable despite only one elasticsearch instance going down? This post here is a different cluster that also had a long running pending task. I'm trying to gain insight on what this is and see if it's possibly related.

/_cluster/pending_tasks

{
    {
      "insert_order": 50312,
      "priority": "URGENT",
      "source": "update_tsdb_data_stream_end_times",
      "executing": false,
      "time_in_queue_millis": 7296762,
      "time_in_queue": "2h"
    }
  ]
}

Same as the issue in the linked thread: in this environment, the elasticsearch logs are flooded with other, more recent tasks that are failing with ProcessClusterEventTimeoutException:

[2024-11-14T18:31:04,436][WARN ][rest.suppressed          ] [<redacted node-1>] path: /designer-objects-ia/_settings, params: {master_timeout=30s, index=designer-objects-ia, timeout=30s}, status: 503
org.elasticsearch.transport.RemoteTransportException: [<redacted node-2>][<redacted-ip>][indices:admin/settings/update]
Caused by: org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (update-settings

Using jre17

buitcj · December 2, 2024, 5:22pm

Small update. I have found that another customer who had pending tasks that did not clear also encountered a problematic cluster - in this other case, I also see a problem with a master node despite the cluster reporting green status on node-2

[WARN ][org.elasticsearch.cluster.coordination.ClusterFormationFailureHelper] [Node-1] master not discovered or elected yet, an election requires 2 nodes with ids [6knpkBjYQ9upI7J_mvOFDQ, w
xkZiYmyRyqRDn3Jn6HO-A], have discovered possible quorum [{<REDACTED>}{wxkZiYmyRyqRDn3Jn6HO-A}{WZyccCOIQ06HSsQDYQ5Z2w}{Node-1}{Node-1}{<REDACTED IP>:9300}{cdfhilmrstw}{8.14.1}{7000099-8505000}, {Node-2}{UwPrZLw2ThG5YkkjcQBfGA}
{XUEtbcqtTVmbUFu8VK20gg}{Node-2}{Node-2}{<REDACTED IP>:9300}{cdfhilmrstw}{8.14.1}{7000099-8505000}, {Node-3}{6knpkBjYQ9upI7J_mvOFDQ}{ozXWe1vdQvynemXJ_GnixA}{Node-3}{Node-3}{<REDACTED IP>:9300}{cdfhilmrstw}{8.14.1}{70
00099-8505000}] who claim current master to be [{Node-2}{UwPrZLw2ThG5YkkjcQBfGA}{XUEtbcqtTVmbUFu8VK20gg}{Node-2}{Node-2}{<REDACTED IP>:9300}{cdfhilmrstw}{8.14.1}{7000099-8505000}]; discovery will continue using [17
<REDACTED IP>:9300, <REDACTED IP>:9300] from hosts providers and [{Node-1}{wxkZiYmyRyqRDn3Jn6HO-A}{WZyccCOIQ06HSsQDYQ5Z2w}{Node-1}{Node-1}{<REDACTED IP>:9300}{cdfhilmrstw}{8.14.1}{7000099-8505000}] from last-known cluster
state; node term 3, last-accepted version 0 in term 0; joining [{Node-2}{UwPrZLw2ThG5YkkjcQBfGA}{XUEtbcqtTVmbUFu8VK20gg}{Node-2}{Node-2}{<REDACTED IP>:9300}{cdfhilmrstw}{8.14.1}{7000099-8505000}] in term [3] has st
atus [waiting for response] after [41.1m/2469981ms]; for troubleshooting guidance, see https://www.elastic.co/guide/en/elasticsearch/reference/8.14/discovery-troubleshooting.html
[2024-11-27T11:33:54,681][WARN ][rest.suppressed          ] [Node-1] path: /_security/role/health_check_role, params: {name=health_check_role}, status: 503
org.elasticsearch.ElasticsearchStatusException: Cluster state has not been recovered yet, cannot write to the [null] index

from node-2:

$ curl -u <user>:"$password" localhost:9200/_cluster/health?pretty
{
  ...
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 13,
  "active_shards" : 37,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 9,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 1094899301,
  "active_shards_percent_as_number" : 100.0

I could not retrieve the pending tasks list from this customer.

However, one more customer did encounter a pending tasks list that did not decrease in size and they also showed that update_tsdb_data_stream_end_times is getting stuck.

Questions:

Is there any logging to determine why this particular task is getting stuck?
What is this task and who's triggering it? As far as I know, this is not started by our product.

Topic		Replies	Views
Pending tasks list flooded with _add_listener_ tasks Elasticsearch	4	2021	July 5, 2017
Stuck pending tasks Elasticsearch	4	2003	July 5, 2017
Stuck "Cancelled Tasks" In ElasticSearch 8.6.2 causing Cluster failure Elasticsearch docker	19	1269	August 8, 2023
Elasticsearch pending_tasks Elasticsearch	11	1562	October 29, 2018
Tasks are stuck for hours for index creation & setting update Elasticsearch	9	352	April 23, 2023

Update_tsdb_data_stream_end_times task stuck pending

Related topics