How does Elasticsearch Splitting an Index Work?

Hao_Yellow · August 31, 2022, 7:18am

Hello, recently I've been working on splitting some indices with too large shards (not too many), so that the cluster's free disk space is more balanced among the multiple nodes.
I read a discussion (see the discussion quote below) and in a reply @DavidTurner says that when an index is split, in my understanding, the index's shards will be cloned multiple times, with each shard cloning on the same node as the original shard, then the cloned shards will be copied to the target nodes of the cluster with shard allocation; finally delete-by-query will be run on each shard to delete redundant data.

But I guess there's more to that? Otherwise something wrong may happen.

Let's say I have 3 nodes of the cluster:
node 1: free disk space 80GB
node 2: free disk space 80GB
node 3: free disk space 80GB
In node 3, there's an index that's composed of a 100GB shard. I want to split the index into 4 shards of a new index, so each shard will be 25GB.
Based on the understanding above, first of all the 100GB shards on node 3 will be copied 4 times, which add up to 400GB... So the first step, cloning, is a problem. Neither of the nodes can accommodate another 100GB, not to say 400GB.
And what about shard allocation? This process may be implicit in the cloning. If the operation is clone and delete-by-query, what will happen if neither nodes are not sufficient to contain the original shard of 100GB before delete-by-query?
But I guess it's supposed to work anyway? And Elasticsearch is intelligent enough to work that out? After all the total free disk space is sufficient for the end result.

Can you explain how it works?
Thanks.

Quoted from the discussion from Cloned/Split Indexes Take Double Disk Space When Increasing Shards:

Cloned/Split Indexes Take Double Disk Space When Increasing Shards

Are you asking about the total disk consumption as reported by the OS (e.g. using df) or do you mean just for the cloned/split index (e.g. using GET _cat/indices)? The latter double-counts the actual disk space used because of the use of hard links.

GET _cat/indices should report the size of a clone to be identical to the size of the original index.

Splitting the index works by cloning all the shards (multiple times) and then effectively running a delete-by-query on them, which certainly increases the reported size until merging cleans up the deleted docs. If you're still writing to this index then that'll happen in time; if you're not still writing to this index then you can try force-merging to make it happen sooner. There's also some per-shard disk space overhead -- particularly the terms dictionary tends to be large and not to get much smaller after a split since most shards contain roughly the same set of terms.

Hao_Yellow · September 2, 2022, 3:25am

Anyone?

Hao_Yellow · September 5, 2022, 4:13am

Any ideas, especially from @DavidTurner? Or is it a wrong question to put here? Thanks.

Christian_Dahlqvist · September 5, 2022, 5:38am

David knows this area well so I would go by what he said. If the nodes does not have enough free disk space i would expect the operation to fail.

Hao_Yellow · September 6, 2022, 6:29am

Thanks, but I'm afraid that hasn't answered the question in this discussion.

Christian_Dahlqvist · September 6, 2022, 7:08am

If you have a single 100GB shard and only 80GB of free space you will not be able to split that shard as the process David described need to take place.

Hao_Yellow · September 6, 2022, 7:24am

Yeah, maybe I have got to accept that impossibility. After all, when one shard is split, the new shards will each have the same disk usage as the original one, until a force merge.
Anyway, I will have a try on this someday. Then I may really validate the impossibility or the unlikely possibility.

Thank you!

DavidTurner · September 6, 2022, 8:26am

There's some optimisations that mean it might use less disk space than you expect, but ultimately if you don't have enough disk space then the operation will indeed fail.

Hao_Yellow · September 7, 2022, 7:27am

Well, that's late, but thanks! Not sure if you haven't read and understood my entire question in the scenario... I will try on a real cluster myself. Then there may be a more accurate and practical answer for my case.

Christian_Dahlqvist · September 7, 2022, 7:52am

It is indeed a problem and it is futher complicated as the delete by query will require a lot of additional disk space, as described in David's response:

I do not think the new shards can get allocated to another node until the delete by query has completed on the original node. Either way it does not work as transferring a 100GB before delete by query runs is not possible.

No. Splitting an index without the required additional disk space will not work.

Hao_Yellow · September 7, 2022, 8:08am

That's clear. Thank you! But that's really not ideal. Normally when one splits an index, they shouldn't worry about the additional disk space, nor have they to know how it works through the split process (clone, delete-by-query, etc) — which, I find, is not mentioned in the Elasticsearch Guide.

Best wishes.

DavidTurner · September 7, 2022, 8:21am

I contribute time to this forum voluntarily. If you need your questions to be answered with a SLA for timeliness then you will need to engage with the support or consulting teams on a commercial basis.

Furthermore, it's rather impolite to ping me (twice!) when I've not already expressed interest in your problem. Your poor manners make it much less likely that you will get a useful response here.

Yes, I have.

Hao_Yellow · September 7, 2022, 9:05am

Thanks for letting me know. And sorry if you think my mentioning you twice is bothering, which I never meant to.
I mentioned you twice because I knew you before (you answered one of my previous questions) and I was afraid you might have missed this question; I thought you would be glad to help (or at least respond for your disinterest).

And I have no idea why you defined that's "poor manners". I was really just looking forward to help.

warkolm · September 7, 2022, 9:11am

Please don't ping people that aren't already part of a topic. They will have any number of reasons for not being involved and they are under no obligations to even express disinterest. Pinging them in this manner because you want an answer is not really polite.

Hao_Yellow · September 7, 2022, 9:16am

I see. It's like it's not really polite to say "hello" or "help" to a stranger that seems distant and indifferent?
Really didn't know this rule before. Now that I know, I'm respecting that.
Thank you.

warkolm · September 7, 2022, 9:18am

The alternate way of looking at that is that you're ignoring their boundaries (implicit or not) and forcing yourself on them.

It's not a rule, we do discourage it and thank you for accepting feedback on it.

Hao_Yellow · September 7, 2022, 9:25am

Yeah.

Anyway, I am myself experimenting with this splitting indices thing. I will put the result here later when it's done.

Hao_Yellow · September 26, 2022, 1:37am

So the answer to the scenario is: the splitting will not work. After trying to split, the new shards will be UNASSIGNED — If so, one may delete the new index.

system · October 24, 2022, 1:38am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Does _split actually splits data or just copies it across shards Elasticsearch	5	334	October 24, 2022
Documentation bug on splitting shards -- free disk space requirement? Elasticsearch	3	24	October 11, 2024
Cloned/Split Indexes Take Double Disk Space When Increasing Shards Elasticsearch	5	1583	December 3, 2020
Unable to Split Large Index Elasticsearch	1	29	August 26, 2024
Split index failed Elasticsearch	3	313	November 30, 2022

How does Elasticsearch Splitting an Index Work?

Related topics