Already deleted indices comes back as dangling whenever a node restarts

jatin_gupta · November 27, 2020, 6:39am

Hi,
I am running a 8 node ES cluster with version 7.4.2
Everytime a node on the cluster restarts the state becomes red as a lot of dangling indices which are already deleted tries to get restored on the cluster.
Only solution is to manually delete all those indices again to make the cluster green.
How do i solve this permanently?

dadoonet · November 27, 2020, 7:24am

Did you try with a more recent version? 7.10.0?

Do you manually stop the node?

jatin_gupta · November 27, 2020, 8:19am

No, actually the ES nodes are running on AWS spot machines, so they sometimes get replaced i.e. goes down and comes back up.

jatin_gupta · November 27, 2020, 9:31am

Similar thing happened again, 4000 unassigned shards due to dangling indices.
/cluster/allocation/explain output :

"shard" : 5,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "DANGLING_INDEX_IMPORTED",
    "at" : "2020-11-27T08:45:39.982Z",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster"

dadoonet · November 27, 2020, 10:03am

What is the output of:

GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v

If some outputs are too big, please share them on gist.github.com and link them here.

jatin_gupta · November 27, 2020, 10:38am

For now i have fixed the cluster by deleting those manually.
If these outputs will help i can manually mark the instance down and cause it to become red. Please let me know.

Christian_Dahlqvist · November 27, 2020, 11:10am

I assume you have a number of nodes that act as master nodes and that are stable and that only part of the cluster is on spot instances?

jatin_gupta · November 27, 2020, 11:15am

I have a 8 node cluster in which 7 are master+data nodes and only 1 is master only node.
basically all are master eligible nodes.
also, all 8 nodes are on spot.

Christian_Dahlqvist · November 27, 2020, 11:19am

Having nodes come and leave the cluster like that can probably be problematic. If you ever lost more than half of the master-eligible nodes at once you would be in serious trouble. When I have seen spot instances used (not many times) it has as far as I can remember often involved having a small set of master eligible data nodes on non-spot instances and dedicated data nodes on spot instances.

jatin_gupta · November 27, 2020, 11:24am

got your point but how can i fix this now as this is a permanent issue whenever any node restarts.
how can it be solved permanently? any way i can delete this state from the cluster that tries to bring those deleted shards back ?
or any property to switch off some dangling indices reassignment in 7.4 version.

dadoonet · November 27, 2020, 1:13pm

Yes please.

AshokBachchan · November 28, 2020, 5:16pm

Please share your application logs while deleting the dangling indices.

It can be helpful to find on which node does having the problematic indices.

You have to choose odd numbers (like 1,3,5,7..) for master nodes to cluster.

You have eight master nodes for cluster, it's not recommended.

I assume you have changed the node role, like master node to master/data node. Due to this only dangling indices will create.

DavidTurner · November 28, 2020, 5:56pm

This is not true; 3 is the recommended number but if you don't want to follow that advice then it doesn't really matter whether the number you pick is even or odd.

I think there's something wrong in the OP's orchestration. Deleting an index will delete all the files on disk, but since these are spot instances that seem to get randomly resurrected I suspect they're reverting to an older state.

I recommend confirming firstly that the index is really being deleted from disk. You can use GET _cat/indices to get the index UUID, and then use find to verify that all the directories named after the index UUID are gone after the DELETE. For instance, here's me deleting an index called indexname with UUID 8M4Kpm5ERuC7q-4hGMeYBA:

$ curl 'http://localhost:9200/_cat/indices/indexname?v'
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   indexname 8M4Kpm5ERuC7q-4hGMeYBA   1   1          0            0       208b           208b
$ find elasticsearch-7.10.0/data-0 -name 8M4Kpm5ERuC7q-4hGMeYBA
elasticsearch-7.10.0/data-0/nodes/0/indices/8M4Kpm5ERuC7q-4hGMeYBA
$ curl -XDELETE 'http://localhost:9200/indexname'
{"acknowledged":true}
$ find elasticsearch-7.10.0/data-0 -name 8M4Kpm5ERuC7q-4hGMeYBA
$ # didn't find anything

You'll need to check that on all 8 nodes. If the directory is being deleted then Elasticsearch will never create it again, so if it comes back again it's not Elasticsearch's doing.

Also if you upgrade to a more recent version you can list/delete dangling indices via APIs directly (see List dangling indices API | Elasticsearch Guide [8.11] | Elastic). This isn't a permanent fix, nor does it explain what's going on, but it is at least an improvement.

IMPORTANT EDIT: don't infer from this that you can delete anything from the data directory yourself -- deleting an index is more than just deleting this one folder, and you should never even consider modifying the contents of the data directory by hand. What I suggest above is just observing that Elasticsearch really does delete the directory.

AshokBachchan · November 29, 2020, 12:13am

If we choose odd no.of master nodes then we can avoid split brain function.
Using minimum number of master nodes parameter.
n/2+1 (n is number of master nodes).

That's why I have suggested this.

I assume one node having index metadata due to this dangling indices created.

I have this faced this.

DavidTurner · November 29, 2020, 7:55am

This does not require an odd number of master nodes.

The discovery.zen.minimum_master_nodes setting has no effect in this version.

system · December 27, 2020, 7:55am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Dangling index problem Elasticsearch	7	3021	July 6, 2017
Dangling indices Elasticsearch	5	605	July 6, 2017
Deleted indices partially come back as dangling indices on node/cluster restarts Elasticsearch	4	2443	July 5, 2017
Dangling index, node can't join cluster Elasticsearch	2	2206	July 5, 2017
Dangling Index Imported Leads to Unassigned Shards Elasticsearch	19	6347	November 7, 2017

Already deleted indices comes back as dangling whenever a node restarts

Related topics