Dangling Index after running Curator

james5 · July 10, 2018, 5:31am

I have a situation where deleted indexes are not deleted from all nodes. As such, when ES is restarted, this Dangling Shard is imported as a Dangling Index and is thus UNASSIGNED and causes the cluster to enter "red" status and alarm bells start going off and things stop working.

I run filebeat/metricbeat and curator. Curator deletes all "metricbeat" indexes older than 30 days. It runs every night. My cluster is green for 'days' (not months..and this is a problem).

Then being already July, ES may restart for some reason (update of config or some other cycle) and then I get metricbeat indexes UNASSIGNED from january.

I then run curator again or manually delete these indexes from earlier, then the cluster goes green. (as mentioned in other tickets, curator wont necessarily see them if they are UNASSIGNED).

How can I run a program (which I am happy to do on all nodes) that would delete all index shards from the disk that don't have an index in ES? Or have an index, but are unassigned due to dangling_import? Any idea how to go about this? (need to do both, need to clean up the disk as ES is not deleting the files)

(By the way, saw the other topic regarding this from Feb 21...but, this question isn't about curator, this question is about how to I manually go about deleting these dangling indexes from the disk...they DO exist. If there is no fix for ES during index DELETE, then we need a workaround...so, how do we build the workaround?)

Christian_Dahlqvist · July 10, 2018, 6:04am

It is strange that they do not get deleted across the cluster. What is the output of the cluster health and cat nodes APIs?

james5 · July 10, 2018, 6:35am

root@monitor:~# curl -XGET localhost:9200/_cluster/health?pretty
{
  "cluster_name" : "perx-application",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 5,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 1373,
  "active_shards" : 2850,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

and

root@monitor:~# curl -XGET localhost:9200/_cat/nodes
172.16.15.182 55 83 10 1.02 1.02 0.81 mdi - monitor3
172.16.15.217 52 98  8 0.52 1.06 1.77 mdi - monitor1
172.16.16.190 56 97  8 0.46 0.56 0.66 mdi - monitor2
172.16.15.14   8 95  3 0.01 0.09 0.08 i   - monitor
172.16.16.173 56 98 19 0.87 1.14 1.17 mdi * monitor4

So, everything is green. But I am pretty sure those 'deleted' shards are still on the disk. They just keep coming back.

So, what I want to do is examine the disk, find a shard file, figure out if its got an index in ES, and if not, delete it. I'd like to just manually delete all dangling indexes. But I don't know how to match the file hash filename with an index name.

Christian_Dahlqvist · July 10, 2018, 6:48am

As you have 4 master-eligible nodes, do you have minimum_master_nodes set to 3 in order to avoid split brain scenarios according to these guidelines?

james5 · July 10, 2018, 7:08am

Here is the relevant config on ALL nodes (configured by Chef)...

# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.zen.ping.unicast.hosts: [ monitor, monitor1, monitor2, monitor3, monitor4 ]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):
#
discovery.zen.minimum_master_nodes: 3
#
# For more information, see the documentation at:
#

Christian_Dahlqvist · July 10, 2018, 7:09am

That is good and rules that out as a potential cause. Which version of Elasticsearch are you running?

zqc0512 · July 10, 2018, 7:26am

i think you have one data node offline with about some days..
if can testing this
restart all nodes ,the delete indices can back or not?
use curl delete the indices and testing again
the indices as one day(xxx-2018.07.10) or one month(xxx-2018.07)

james5 · July 10, 2018, 7:31am

I will test again. But won't be today!!

system · August 7, 2018, 7:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Dangling Index Imported Leads to Unassigned Shards Elasticsearch	19	6263	November 7, 2017
Dangling indices :- coordinating node Elasticsearch	5	642	July 12, 2018
Already deleted indices comes back as dangling whenever a node restarts Elasticsearch	15	1657	December 27, 2020
DANGLING_INDEX_IMPORTED: How to cleanup old deleted indices Elasticsearch	4	3191	May 3, 2019
Cluster State Red - All Shards Were Deleted but Marvel Says 'Unassigned' Elasticsearch elastic-stack-monitoring	3	1096	July 6, 2017

Dangling Index after running Curator

Related topics