Stuck "Cancelled Tasks" In ElasticSearch 8.6.2 causing Cluster failure

DavidTurner · July 3, 2023, 6:47pm

Hmm actually having said that I don't think it's a config issue, Linux has a silly default that means that it can take 900+ seconds between a connection drop and a notification to userspace about the connection drop. This page of the manual has more details. You'd see messages about dropped connections in the logs if it was this.

You'd need to be a bit more precise about what you mean by "kill our cluster" tho. The other recent thread on this topic has the zombie tasks consuming a lot of CPU, but that wouldn't happen if they were just waiting for a dead connection to time out. They'd potentially hold on to a lot of heap, causing GC pressure, but wouldn't themselves consume any CPU.

Topic		Replies	Views
Cancellation of tasks Elasticsearch	11	1406	July 31, 2023
Case Heavy load : How we can search all search request cancellable to cancellation Elasticsearch	4	320	October 28, 2020
Large dataset leads to TaskCancelledException: cancelled Elasticsearch	4	1836	September 4, 2020
Hanging active search threads Elasticsearch	1	315	July 13, 2020
Why a cancelled task is still on the list? Elasticsearch	8	538	July 19, 2023

Stuck "Cancelled Tasks" In ElasticSearch 8.6.2 causing Cluster failure

Related topics