Get tasks hang and accumulate on 5.4.2

Tamas_Pakolicz · July 8, 2017, 4:33pm

Hi!

I have a problem with our Elasticsearch 5.4.2 cluster.
We have 52 nodes of which 40 are data nodes. On our non-master data nodes tasks are accumulating and get stuck. A full cluster restart did not solve our problem, the accumulating tasks came back quickly. If we restart a node which we have a problem, it just transfers to another nodes like a disease.
The detailed task list shows entries like this one:

         "node" : "Q_pZKTu4R9-wldTlrPsLcA",
          "id" : 29589541,
          "type" : "netty",
          "action" : "indices:data/read/get",
          "description" : "",
          "start_time_in_millis" : 1499508299639,
          "running_time_in_nanos" : 16858865861967,
          "cancellable" : false
        },

The problem first surfaced after we upgraded the cluster to 5.4. We have never had it before.

thiago · July 8, 2017, 9:18pm

Does your application updates a document before it tries to retrieve it with the Get API?

Tamas_Pakolicz · July 8, 2017, 9:27pm

No, it's just a get after a search.

thiago · July 8, 2017, 9:46pm

I've asked this because the realtime get has changed recently. It will do a refresh before getting the document if it changed.

But since you are not updating it, then it is not clear to me why the task list is filling with get action.

Maybe @jasontedor knows more?

jasontedor · July 9, 2017, 2:35am

I think that we need to see output from the hot threads API and also the output of /_nodes/stats?filter_path=**.thread_pool.get.

Tamas_Pakolicz · July 9, 2017, 4:23am

The cluster was completely restarted ~7 hours ago, so there are not too many hung tasks (~17k at this moment) but the application hangs after a couple minutes with ES read timeouts and nothing can be seen in the ES's logs. And the app did not change.

Hot threads:
https://drive.google.com/open?id=0B7_2U7nvLS3TcjRfeDdaeVZyYnc

thread.pool.get:
https://drive.google.com/open?id=0B7_2U7nvLS3TbVdxY0FaOWgxTm8

Attila_Nagy · July 9, 2017, 7:37am

BTW, for search and other operations task api shows which index is used. Get doesn't. Is it intentional?
Maybe it's relevant which index has the problem (if it can be narrowed to one).

Tamas_Pakolicz · July 10, 2017, 6:27am

After a really complete restart (master nodes too), the accumulation of tasks seems to have stopped. Anyway, I would really be a lot calmer if I had known what caused this phenomenon.

Attila_Nagy · July 24, 2017, 2:05pm

I've opened an issue from this:

because we still don't know what happened and how this could be solved (apart from getting every possible load off from elasticsearch and several cluster restarts).
Please tell me if we can provide more info to help to sort this out.

system · August 21, 2017, 2:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch has accumulated a lot of pending tasks, and stop indexing Elasticsearch	13	861	October 4, 2023
Hung tasks in ES, that cannot be cancelled using the Task API Elasticsearch	1	805	February 18, 2020
ES 5.1.1 node stuck in endless loop halting the whole cluster Elasticsearch	6	1889	February 14, 2017
Elasticsearch getting stucked after few iterations Elasticsearch	5	925	July 5, 2017
Hanging active search threads Elasticsearch	1	319	July 13, 2020

Get tasks hang and accumulate on 5.4.2

Related topics