Are you asking about the total disk consumption as reported by the OS (e.g. using df
) or do you mean just for the cloned/split index (e.g. using GET _cat/indices
)? The latter double-counts the actual disk space used because of the use of hard links.
GET _cat/indices
should report the size of a clone to be identical to the size of the original index.
Splitting the index works by cloning all the shards (multiple times) and then effectively running a delete-by-query on them, which certainly increases the reported size until merging cleans up the deleted docs. If you're still writing to this index then that'll happen in time; if you're not still writing to this index then you can try force-merging to make it happen sooner. There's also some per-shard disk space overhead -- particularly the terms dictionary tends to be large and not to get much smaller after a split since most shards contain roughly the same set of terms.