Removing original index after split

kalikasan · February 2, 2022, 2:49pm

Hi,

I've got a single shard index of 75GB taking 40% of my physical disk space (according to df) which I needed to split into two shards. After the split, the disk space taken by the new index is around the double of that (175GB according to Kibana/Index Management) and slowly going down (which is expected AFAIU). Both indexes now take only 43% of my disk space (according to df) and slowly going up: this also seems to be expected since, IIUC, the hard links created during the split are progressively being replaced by copied data. Please, correct me if I am wrong.

My question is: when is it safe to remove the original index from my cluster? I expect that I have to wait for the whole splitting process to be done. Is there a command that I could use to monitor the disk space operations that are taking place? I've tried to monitor using GET _cat/recovery?v that returns the following:

index            shard time  type           stage source_host source_node target_host target_node repository snapshot files files_recovered files_percent files_total bytes bytes_recovered bytes_percent bytes_total translog_ops translog_ops_recovered translog_ops_percent
original_index   0     48.8s existing_store done  n/a         n/a         172.18.0.2  es-primary  n/a        n/a      0     0               100.0%        356         0     0               100.0%        77571827169 0            0                      100.0%
new_index        0     2.6m  local_shards   done  n/a         n/a         172.18.0.2  es-primary  n/a        n/a      0     0               100.0%        355         0     0               100.0%        77571822082 0            0                      100.0%
new_index        1     2.6m  local_shards   done  n/a         n/a         172.18.0.2  es-primary  n/a        n/a      0     0               100.0%        355         0     0               100.0%        77571822082 0            0                      100.0%

As far as I understand, there's no recovery to be done or that would be in progress, unless I am looking at wrong places. Could someone guide me toward the right monitoring process and find the right time to delete original_index like I am five?

Thanks a lot!

warkolm · February 2, 2022, 11:11pm

Welcome to our community!

Once the split process has been completed you can delete it. You should still see the process in _cat/tasks?v if it's running.

system · March 2, 2022, 11:11pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cloned/Split Indexes Take Double Disk Space When Increasing Shards Elasticsearch	5	1583	December 3, 2020
Used split API, resulting index is much, much larger than the source index Elasticsearch	1	73	June 4, 2024
Split API: shard sizing issue post split process Elasticsearch	2	377	February 17, 2021
Index size explodes after _split Elasticsearch	4	917	November 14, 2018
Index stays at massive size after split Elasticsearch	6	1194	October 5, 2021

Removing original index after split

Related topics