Elasticsearch Task API does not cancel tasks

RockD · November 21, 2018, 4:57pm

Hi

I'm currently having problem with my cluster, trying debuging certain documents that where indexed, I made a query with script in it, and queue various request of that search.

The problem is that now I'm trying to cancel that tasks, and it says that its cancelled, but still appear in the list of task still running for several hours now.
Tried to kill the parent tasks, along with the children ones, but still running.
The kibana stops connecting to the elasticsearch, and the only way to do something it's from curl

The elastic stack version is 6.2.4
Doing the following request

curl -X GET "localhost:9200/_cat/tasks?v"

I get

indices:data/read/search a98sSQVJRtefhq4egRBVkg:3308613042 - transport 1542808265542 10:51:05 2.8h 172.27.202.150 es1
indices:data/read/search[phase/query] a98sSQVJRtefhq4egRBVkg:3308613043 a98sSQVJRtefhq4egRBVkg:3308613042 direct 1542808265542 10:51:05 2.8h x.x.x.x es1
indices:data/read/search[phase/query] a98sSQVJRtefhq4egRBVkg:3308613044 a98sSQVJRtefhq4egRBVkg:3308613042 direct 1542808265542 10:51:05 2.8h x.x.x.x es1
indices:data/read/search[phase/query] a98sSQVJRtefhq4egRBVkg:3308613045 a98sSQVJRtefhq4egRBVkg:3308613042 direct 1542808265543 10:51:05 2.8h x.x.x.x es1
indices:data/read/search[phase/query] a98sSQVJRtefhq4egRBVkg:3308613046 a98sSQVJRtefhq4egRBVkg:3308613042 direct 1542808265543 10:51:05 2.8h x.x.x.x es1
indices:data/read/search[phase/query] LnF0A5gATiK3Fjy-cAUSLQ:2222282441 a98sSQVJRtefhq4egRBVkg:3308613042 netty 1542808265596 10:51:05 2.8h x.x.x.x es3
indices:data/read/search[phase/query] LnF0A5gATiK3Fjy-cAUSLQ:2222282439 a98sSQVJRtefhq4egRBVkg:3308613042 netty 1542808265596 10:51:05 2.8h x.x.x.x es3
indices:data/read/search[phase/query] LnF0A5gATiK3Fjy-cAUSLQ:2222282442 a98sSQVJRtefhq4egRBVkg:3308613042 netty 1542808265597 10:51:05 2.8h x.x.x.x es3
indices:data/read/search[phase/query] xIZT5fVqQuugw2qYbBRrXA:3595912068 a98sSQVJRtefhq4egRBVkg:3308613042 netty 1542808265607 10:51:05 2.8h x.x.x.x es2
indices:data/read/search[phase/query] xIZT5fVqQuugw2qYbBRrXA:3595912066 a98sSQVJRtefhq4egRBVkg:3308613042 netty 1542808265607 10:51:05 2.8h x.x.x.x es2
indices:data/read/search[phase/query] xIZT5fVqQuugw2qYbBRrXA:3595912071 a98sSQVJRtefhq4egRBVkg:3308613042 netty 1542808265608 10:51:05 2.8h x.x.x.x es2

and requesting this

curl -X GET "localhost:9200/_tasks?actions=*search&detailed&pretty'

I get several of the followings

"a98sSQVJRtefhq4egRBVkg:3309107911" : {
"node" : "a98sSQVJRtefhq4egRBVkg",
"id" : 3309107911,
"type" : "transport",
"action" : "indices:data/read/search",
"description" : "indices[.kibana], types, search_type[QUERY_THEN_FETCH], source[{"from":0,"size":10000,"query":{"bool":{"filter":[{"term":{"type":{"value":"index-pattern","boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"_source":{"includes":["index-pattern.title","type","title"],"excludes":}}]",
"start_time_in_millis" : 1542809005112,
"running_time_in_nanos" : 9505448128013,
"cancellable" : true,
"headers" : { }
},

I need to terminate this tasks to make the cluster usable again, if someone has any direction to follow for solving this problem will be appreciated.

Thanks in advance.

Rod

RockD · December 14, 2018, 7:46pm

anyone?

nik9000 · December 14, 2018, 9:53pm

If you cancel a task and it doesn't go away "soon" it is a bug. What kind of bug or how we'd reproduce it, I don't know. These things happen because cancelling is a "cooperative" thing because of the constraints of the JVM. Tasks have to notice that they are cancelled and then shut down. There is no way to forcible cancel a task.

I'd file an issue with, if possible, the results of running the hot_threads API and, if possible, jstack. That'd tell us if the search query is running for a long time in some code that isn't paying attention to the cancel. We expect it to spend some time in that code, but not a long time.

I'm intentionally vague about the times here because, well, I don't remember all of this code. And also because times come from the size of the workload and things like that.

RockD · January 2, 2019, 2:43pm

Thanks for the reply Nik

We had to restart the cluster to restore it, but the problem doesn't occur again.
It's a cluster in production, if it happen again (I hope not), I will gather the pertinent data and file the issue that you suggested.

Thanks again for the reply, and I will keep it posted if it happen again

Regards

system · January 30, 2019, 2:43pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Hung tasks in ES, that cannot be cancelled using the Task API Elasticsearch	1	805	February 18, 2020
Cancellation of tasks Elasticsearch	11	1552	July 31, 2023
TASK doesn't support cancellation Kibana	4	1643	July 7, 2020
Elasticsearch-py automatic cancelling Elasticsearch	4	317	January 25, 2019
Why a cancelled task is still on the list? Elasticsearch	8	572	July 19, 2023

Elasticsearch Task API does not cancel tasks

Related topics