Dangling Index Imported Leads to Unassigned Shards

Curator is calling DELETE api to delete indices. So, you should have the same result regardless of how you call.

The issue happens in the following scenario that is possible on a pre-5.x cluster

  • the index foo is created, it allocates its shards on some nodes including the node nodeA
  • nodeA is shut down
  • the index foo is deleted from the cluster and all traces of this index are removed from file systems of all nodes that are in the cluster and the cluster stat. nodeA still have these files because it wasn't part of the cluster when the index was deleted.
  • nodeA rejoins the cluster, the cluster finds some shards that belong to now unknown index foo. What do we do? We could have just ignored the finding and just deleted this files... but, just in case, we try to import this files as a dangling index thinking that annoyance of dealing with this index by deleting it twice is better than potentially losing data.

So, I see 3 possible ways to solve this issue:

  1. avoid scenario described above - by not deleting indices while nodes are restarting (probably not very practical)
  2. clean the file system on the nodes that were shutdown before allowing them to rejoin the cluster by removing directories corresponding to the indices that have been deleted (a bit more practical but requires some coding and might lead to data loss if not implemented correctly)
  3. upgrade the cluster to 5.x where this problem was solved by keeping track of deleted indices in the cluster state for some period of time after they were deleted.

Obviously, my vote would be to go with option 3.