Cloned/Split Indexes Take Double Disk Space When Increasing Shards

DavidTurner · November 5, 2020, 9:35am

Are you asking about the total disk consumption as reported by the OS (e.g. using df) or do you mean just for the cloned/split index (e.g. using GET _cat/indices)? The latter double-counts the actual disk space used because of the use of hard links.

GET _cat/indices should report the size of a clone to be identical to the size of the original index.

Splitting the index works by cloning all the shards (multiple times) and then effectively running a delete-by-query on them, which certainly increases the reported size until merging cleans up the deleted docs. If you're still writing to this index then that'll happen in time; if you're not still writing to this index then you can try force-merging to make it happen sooner. There's also some per-shard disk space overhead -- particularly the terms dictionary tends to be large and not to get much smaller after a split since most shards contain roughly the same set of terms.

Topic		Replies	Views
Documentation bug on splitting shards -- free disk space requirement? Elasticsearch	3	24	October 11, 2024
How does Elasticsearch Splitting an Index Work? Elasticsearch	18	2361	October 24, 2022
Unable to Split Large Index Elasticsearch	1	29	August 26, 2024
Index size explodes after _split Elasticsearch	4	917	November 14, 2018
Removing original index after split Elasticsearch	2	459	March 2, 2022

Cloned/Split Indexes Take Double Disk Space When Increasing Shards

Related topics