Index size explodes after _split

cdekker · October 2, 2018, 12:02pm

Recently I had to split an index that had grown to 100GB on a single shard. I used the Split API to split this into 8 shards which nearly instantly created the new index with all 8 shards on the same node. I deleted the old index and created an alias from the new one

I noticed the primary store size had grown from 100GB to about 650GB. When enabling replicas and waiting for the cluster to re-balance I ran into out of disk issues over the weekend, causing my entire cluster to go into read-only mode.

I deleted the replicas again which freed up some disk space and unblocked the cluster again.

Over the course of a week (!) the index shrunk back down to its normal 100GB size on 8 shards.

How can I prevent this from happening in the future? Is there a way to split an index without its size exploding? Can forcemerge be used for this and what will be the impact if the index is used constantly for reading and writing?

warkolm · October 4, 2018, 5:20am

This is expected, per the docs;

Splitting works as follows:

First, it creates a new target index with the same definition as the source index, but with a larger number of primary shards.

Then it hard-links segments from the source index into the target index. (If the file system doesn’t support hard-linking, then all segments are copied into the new index, which is a much more time consuming process.)

Once the low level files are created all documents will be hashed again to delete documents that belong to a different shard.

Finally, it recovers the target index as though it were a closed index which had just been re-opened.

cdekker · October 17, 2018, 1:58pm

So @warkolm , would you recommend running a forcemerge first, before enabling replicas? Splitting by a significant factor can easily fill up the disks on a decent sized index.

warkolm · October 17, 2018, 8:59pm

I don't think that would help, doing one after should though.

system · November 14, 2018, 8:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Index stays at massive size after split Elasticsearch	6	1289	October 5, 2021
Index store size increased by 4X on new index created using slipt api Elasticsearch	4	376	April 6, 2021
Elasticsearch Index Split Elasticsearch	5	338	November 1, 2021
ElasticSearch Split API - excessive use of disk space Elasticsearch	4	487	April 22, 2021
Cloned/Split Indexes Take Double Disk Space When Increasing Shards Elasticsearch	5	1769	December 3, 2020

Index size explodes after _split

Related topics