Old shards not deleted upon relocation

uday · January 10, 2017, 11:00pm

Hi,

Our Elasticsearch cluster has two data directories. We recently restarted all the nodes in the cluster. After the successful restart process, we observed increased disk space usage on few nodes. When we examined the folders inside the data directory, we found that there are orphaned shards. For example, an orphaned shard "15" exists at location data_dir0/cluster_name/nodes/0/indices/index_name/15, while one of the replicas of the same shard "15" exists on the same node inside other data directory, here at data_dir1/cluster_name/nodes/0/indices/index_name/15. This shard "15" from data_dir1 is also included in cluster metadata and thus, we assume that shard "15" from data_dir0 is an orphaned shard and has to be deleted by Elasticsearch. But Elasticsearch hasn't deleted the orphaned shard yet, even after 6 days since last restart.

We found this topic Old shards on re-joining nodes useful? relating to our issue but it did not help us as in ES did not take care of that orphaned shard.

Any help to delete orphaned shards and recover the disk space is highly appreciated.

Thanks

ywelsch · January 11, 2017, 3:56pm

Is the cluster green? Shard data is only deleted if there are enough shard copies in the cluster (i.e. the shard is fully allocated with no unassigned copies).

uday · January 11, 2017, 8:04pm

Yes, the cluster is green and the restart happened 6 days ago successfully. But we still see old shards on the disk

ywelsch · January 12, 2017, 1:52pm

Sorry, I misread your first post. The issue is that cleanup of shard data on one data path does not happen if the shard is allocated on the same node on another data path. The question is why the node decided to allocate the shard on a different data path (it will normally reuse the same path if there is shard data already there). Is it possible that the shard on the previous data path was never fully allocated? (i.e. initializing but never started). What ES version is this?
Can you provide the folder content of both directories?

tree data_dir0/cluster_name/nodes/0/indices/index_name/15 and
tree data_dir1/cluster_name/nodes/0/indices/index_name/15.

I'm in particular looking for a state-* file in the orphaned shard folder.

uday · January 12, 2017, 11:43pm

We don't have logs/proofs about the shard distribution before. So, we cannot say if the old shard was fully allocated. Our ES version is 2.3.3. The folder contents on both the disks are:

Old shard: http://pastebin.com/cU1UFXqD
New shard: http://pastebin.com/gVPTmMHX

Please let us know how we can remove old shards and retrieve disk space. Link to official documentation/issue would really help.

Thanks!

ywelsch · January 13, 2017, 1:10pm

The missing directory /storage/disk0/elasticsearch/cluster_name/nodes/0/indices/index_name/_state indicates that the shard failed to be fully allocated to the node in an earlier recovery attempt. When allocating a shard, ES uses the directory which has already shard data. This directory is identified by the _state file (which is missing in this case due to the unfinished recovery attempt). In your case, the node picked the other data directory to allocate the shard because it could not see the existing shard. The stale shard data was also not cleaned up as ES only deletes shard directories if the shard is not allocated to the current node.
The first issue will not occur on ES v5.x as the recovery process has been changed in that regard. I think there is no easy fix here except to manually delete the directories.

uday · January 23, 2017, 9:44pm

Thanks for the help @ywelsch. We do not want to delete the Elasticsearch folders manually as we are dealing with this problem in production. Any official link to the manual delete suggestion will be very helpful. Else, we are going to replace the affected nodes one by one even though it will incur huge data transfer costs.

ywelsch · January 25, 2017, 10:31am

There is nothing in the documentation covering this.

system · February 22, 2017, 10:31am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shard data left on disk after MOVE Elasticsearch	1	370	December 8, 2016
Elasticsearch does not delete old folder after shard rerouting Elasticsearch	1	335	July 5, 2017
Shard relocation: shard not deleted from original node Elasticsearch	1	526	July 5, 2017
Non-existent shards on drive - delete? Elasticsearch	1	525	July 5, 2017
After moving a shard related files do not disappear from the disk Elasticsearch	2	357	November 23, 2018

Old shards not deleted upon relocation

Related topics