A major issue with cluster state handling and persistent tasks cancellation

We are using ES 9.3.0.

Our cluster has many data streams and indices. The cluster state is ~350 MB (compressed on disk) under normal conditions.

I mistakenly scheduled a large number of downsampling tasks for historical data, which created ~2,200 persistent tasks.

After that, I canceled the tasks and removed the ILM policy from the data stream. However, logs show that each canceled task triggers a full cluster state update (why? I just canceled the tasks).

Each update currently takes ~20 seconds, making the cluster effectively unusable.

[2026-04-24T13:55:53,612][WARN ][o.e.g.PersistedClusterStateService] [hssf43] writing cluster state took [21473ms] which is above the warn threshold of [10s]; [wrote] global metadata, wrote [0] new mappings, removed [0] mappings and skipped [1514] unchanged mappings, wrote metadata for [0] new indices and [0] existing indices, removed metadata for [0] indices and skipped [2219] unchanged indices

[2026-04-24T13:56:44,213][WARN ][o.e.g.PersistedClusterStateService] [hssf43] writing cluster state took [21408ms] which is above the warn threshold of [10s]; [wrote] global metadata, wrote [0] new mappings, removed [0] mappings and skipped [1514] unchanged mappings, wrote metadata for [0] new indices and [0] existing indices, removed metadata for [0] indices and skipped [2219] unchanged indices

[2026-04-24T13:57:35,750][WARN ][o.e.g.PersistedClusterStateService] [hssf43] writing cluster state took [21693ms] which is above the warn threshold of [10s]; [wrote] global metadata, wrote [0] new mappings, removed [0] mappings and skipped [1514] unchanged mappings, wrote metadata for [0] new indices and [0] existing indices, removed metadata for [0] indices and skipped [2219] unchanged indices

[2026-04-24T13:58:26,707][WARN ][o.e.g.PersistedClusterStateService] [hssf43] writing cluster state took [21609ms] which is above the warn threshold of [10s]; [wrote] global metadata, wrote [0] new mappings, removed [0] mappings and skipped [1514] unchanged mappings, wrote metadata for [0] new indices and [0] existing indices, removed metadata for [0] indices and skipped [2219] unchanged indices

[2026-04-24T13:59:16,408][WARN ][o.e.g.PersistedClusterStateService] [hssf43] writing cluster state took [21609ms] which is above the warn threshold of [10s]; [wrote] global metadata, wrote [0] new mappings, removed [0] mappings and skipped [1514] unchanged mappings, wrote metadata for [0] new indices and [0] existing indices, removed metadata for [0] indices and skipped [2219] unchanged indices

[2026-04-24T14:00:06,732][WARN ][o.e.g.PersistedClusterStateService] [hssf43] writing cluster state took [21610ms] which is above the warn threshold of [10s]; [wrote] global metadata, wrote [0] new mappings, removed [0] mappings and skipped [1514] unchanged mappings, wrote metadata for [0] new indices and [0] existing indices, removed metadata for [0] indices and skipped [2219] unchanged indices

[2026-04-24T14:00:57,081][WARN ][o.e.g.PersistedClusterStateService] [hssf43] writing cluster state took [21810ms] which is above the warn threshold of [10s]; [wrote] global metadata, wrote [0] new mappings, removed [0] mappings and skipped [1514] unchanged mappings, wrote metadata for [0] new indices and [0] existing indices, removed metadata for [0] indices and skipped [2219] unchanged indices

[2026-04-24T14:01:47,891][WARN ][o.e.g.PersistedClusterStateService] [hssf43] writing cluster state took [21611ms] which is above the warn threshold of [10s]; [wrote] global metadata, wrote [0] new mappings, removed [0] mappings and skipped [1514] unchanged mappings, wrote metadata for [0] new indices and [0] existing indices, removed metadata for [0] indices and skipped [2219] unchanged indices

[2026-04-24T14:02:38,416][WARN ][o.e.g.PersistedClusterStateService] [hssf43] writing cluster state took [22010ms] which is above the warn threshold of [10s]; [wrote] global metadata, wrote [0] new mappings, removed [0] mappings and skipped [1514] unchanged mappings, wrote metadata for [0] new indices and [0] existing indices, removed metadata for [0] indices and skipped [2219] unchanged indices

Is there a way to speed up this process or mitigate the impact?

p.s. Currently there's no write activity from users.

Persistent tasks are recorded in the cluster state, so cancelling a persistent task requires a cluster state update. These are in fact not "full" cluster state updates, they're incremental (see skipped [1514] unchanged mappings and skipped [2219] unchanged indices) but that doesn't really help you.

I don't have any other suggestions beyond waiting for these cancellations to complete. At the current rate I guess it'll be done in 12h or so, although as the number of tasks decreases they should get faster.

1 Like

Hi, David!

Thank you for answering.

If this is supposed to be a delta update, why does each cluster state update result in a new segment being written to disk that is about 90–95% of the total state size?

And one more clarification question: when all tasks have disappeared from the Tasks API and there is no ILM policy that created them, can we be sure they won’t be restored after a cluster restart?

It's the [wrote] global metadata bit - this is where persistent tasks are stored, and it's not subdivided so we have to rewrite the whole thing on each change.

You'd need to watch GET _cluster/state to see the actual state of these persistent tasks.

1 Like

You'd need to watch GET _cluster/state

Not sure I understand the situation. I canceled the tasks, and they disappeared from the _tasks API results, but they are still present in the cluster state.

less cluster_state_full.json | grep rollup-shard -c
2234

If it matters, ILM is stopped globally.

There are (at least?) three different things called "tasks" in Elasticsearch. The ones in GET _tasks are things that are actively running in the system at the time. The ones in the cluster state are persistent tasks which will normally correspond to an active task but may not be assigned to a node at the time. Then there's GET cluster/pending_tasks which are cluster state updates. It's confusing indeed.

According to pending tasks API, I see my not yet canceled tasks as:

"source": "update project [default] task state [downsample-downsample-5m-.ds-metrics-otelcol.v1-devops-2025.10.12-000106-1-5m]",

and tasks which are already updated (after cancel) as:

"source": "finish project [default] persistent task [downsample-downsample-5m-.ds-metrics-otelcol.v1-devops-2025.12.22-000276-2-5m] (success)",

So, do I need to do anything additional to completely remove it from the cluster state? I’m concerned that it might be restored later.