Auto_import_dangled for ES 6.3.X

fassisrosa · November 7, 2018, 8:21pm

Hi there,

I am having some issues with dangled indexes. Scenario seems to be:

*) Data node went down, was down for a while.
*) Life went on, some indexes were deleted from cluster.
*) Data node came back, saw indices in local filesystem, no longer in cluster, did the auto import of indices to cluster.
*) At this point I start to see my pending tasks fill slowly up (GET /_cluster/health?pretty). Lots of "allocation dangled indices" task.

Overall cluster performance degrades to the point that it becomes unusable.
Sometime ago there was an auto_import_dangled configuration to allow us to bypass this importing of dangled indexes. It seems to have disappeared in 6.3.1.

Any ideas on what I should to do either:

*) Avoid the dangled indices import in the first place?
*) Prevent the pending tasks to fill up with these events?

I know which data node is causing this issue, I could wipe out node data and restart that node and I'm guessing it would solve my issue. But it seems a bit extreme as all my data will need to be recreated. There has to be a way to have the data node rejoin without this issue no?

Thanks,

Francisco.

fassisrosa · November 7, 2018, 8:25pm

In logs of data node, when node starts up,I see a lot of these:

[2018-11-07T16:07:43,635][INFO ][o.e.g.DanglingIndicesState] [data-1] [[idx1]] dangling index exists on local file system, but not in cluster metadata, auto import to cluster state

DavidTurner · November 7, 2018, 10:24pm

If you'd known in advance you were going to delete a lot of indices with a missing node your could have increased the size of the index graveyard beforehand.

Alternatively if you'd realised that a lot of deletions had occurred you could have wiped the data node before it started up. This would involve rebuilding any existing shards, but many of these may require rebuilding anyway, or will have been allocated to another node, so will be deleted when the failed node rejoins.

However now that the dangling indices have been imported the best path forward is to delete each unwanted index, guided by the log messages quoted in your second message.

I don't understand why importing dangling indices is slowing your cluster down.

fassisrosa · November 8, 2018, 1:09pm

Thanks for your answer David.

Yes, I see that I could have wiped out data on node before I brought it back... So no more auto_import_dangled param or something like that?

This cluster has LOT of activity. In particular there is a lot of index creation/deletion going on. What I see is that the pending tasks list is always populated with the creation/deletion/mapping tasks. I see that the dangled index allocation task keeps being added without being resolved. Maybe because the priority is NORMAL and other tasks take precedence? In any case this pending tasks keeps growing and any other task (e.g. state querying) now takes longer to complete.

Is there any config param I can tweak to attempt to purge pending tasks faster (e.g. I can tweak param to recreate more indexes in parallel, I can tweak thread pools to allow some operations to have more working threads)?

Thanks,

Francisco.

fassisrosa · November 8, 2018, 1:14pm

Dove deeper into the link you sent, I see how the index graveyard could help. I think part of the issue is that this cluster is creating/deleting a ton of indexes. Probably not a very good idea. The size of the index graveyard is going to be insufficient for the amount of deletes. But I guess I can play with this size and the acceptable downtime of a data node. This one was down for a while and that is no doubt the reason why when it came back it interacted with an index graveyard way beyond the deleted indexes the node was still holding on to. Very helpful David!

From my previous message, any way I can allocate more resources for task processing?

Thanks so much,

Francisco.

DavidTurner · November 8, 2018, 1:33pm

As far as I can see, this parameter was removed in #10016 in March 2015, so it hasn't existed since version 1.7.6.

The pending tasks are processed by a single thread on the master. This is by design - they can't safely happen in parallel.

This sounds like a bad idea. Why are you creating/deleting indices so frequently? It sounds like the master can't really keep up. Indexing and searching generally doesn't involve the master, so scales out with the number of data nodes, but creating and deleting indices requires coordination so there's a limit to how much of this a single cluster can do.

fassisrosa · November 8, 2018, 4:03pm

I do think you pointed me in the right direction, great information, thanks so much David!

system · December 6, 2018, 4:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Dangling index and auto import to cluster state Elasticsearch	1	1155	February 27, 2018
ES Cluster not importing dangled indices after restart Elasticsearch	7	1234	February 6, 2019
Possible bug in elastic causes data loss in a rare scenario Elasticsearch	5	2400	July 6, 2017
Import dangling indices on a single server Elasticsearch	1	618	July 5, 2017
Dangling index info messages in logs Elasticsearch	4	561	July 6, 2017

Auto_import_dangled for ES 6.3.X

Related topics