How does Elasticsearch Splitting an Index Work?

Hello, recently I've been working on splitting some indices with too large shards (not too many), so that the cluster's free disk space is more balanced among the multiple nodes.
I read a discussion (see the discussion quote below) and in a reply @DavidTurner says that when an index is split, in my understanding, the index's shards will be cloned multiple times, with each shard cloning on the same node as the original shard, then the cloned shards will be copied to the target nodes of the cluster with shard allocation; finally delete-by-query will be run on each shard to delete redundant data.

But I guess there's more to that? Otherwise something wrong may happen.

Let's say I have 3 nodes of the cluster:
node 1: free disk space 80GB
node 2: free disk space 80GB
node 3: free disk space 80GB
In node 3, there's an index that's composed of a 100GB shard. I want to split the index into 4 shards of a new index, so each shard will be 25GB.
Based on the understanding above, first of all the 100GB shards on node 3 will be copied 4 times, which add up to 400GB... So the first step, cloning, is a problem. Neither of the nodes can accommodate another 100GB, not to say 400GB.
And what about shard allocation? This process may be implicit in the cloning. If the operation is clone and delete-by-query, what will happen if neither nodes are not sufficient to contain the original shard of 100GB before delete-by-query?
But I guess it's supposed to work anyway? And Elasticsearch is intelligent enough to work that out? After all the total free disk space is sufficient for the end result.

Can you explain how it works?
Thanks.

Quoted from the discussion from Cloned/Split Indexes Take Double Disk Space When Increasing Shards:

Anyone?

Any ideas, especially from @DavidTurner? Or is it a wrong question to put here? Thanks.

David knows this area well so I would go by what he said. If the nodes does not have enough free disk space i would expect the operation to fail.

Thanks, but I'm afraid that hasn't answered the question in this discussion.

If you have a single 100GB shard and only 80GB of free space you will not be able to split that shard as the process David described need to take place.

Yeah, maybe I have got to accept that impossibility. After all, when one shard is split, the new shards will each have the same disk usage as the original one, until a force merge.
Anyway, I will have a try on this someday. Then I may really validate the impossibility or the unlikely possibility.

Thank you!

There's some optimisations that mean it might use less disk space than you expect, but ultimately if you don't have enough disk space then the operation will indeed fail.

Well, that's late, but thanks! Not sure if you haven't read and understood my entire question in the scenario... I will try on a real cluster myself. Then there may be a more accurate and practical answer for my case.

It is indeed a problem and it is futher complicated as the delete by query will require a lot of additional disk space, as described in David's response:

I do not think the new shards can get allocated to another node until the delete by query has completed on the original node. Either way it does not work as transferring a 100GB before delete by query runs is not possible.

No. Splitting an index without the required additional disk space will not work.

That's clear. Thank you! But that's really not ideal. Normally when one splits an index, they shouldn't worry about the additional disk space, nor have they to know how it works through the split process (clone, delete-by-query, etc) — which, I find, is not mentioned in the Elasticsearch Guide.

Best wishes.

I contribute time to this forum voluntarily. If you need your questions to be answered with a SLA for timeliness then you will need to engage with the support or consulting teams on a commercial basis.

Furthermore, it's rather impolite to ping me (twice!) when I've not already expressed interest in your problem. Your poor manners make it much less likely that you will get a useful response here.

Yes, I have.

2 Likes

Thanks for letting me know. And sorry if you think my mentioning you twice is bothering, which I never meant to.
I mentioned you twice because I knew you before (you answered one of my previous questions) and I was afraid you might have missed this question; I thought you would be glad to help (or at least respond for your disinterest).

And I have no idea why you defined that's "poor manners". I was really just looking forward to help.

Please don't ping people that aren't already part of a topic. They will have any number of reasons for not being involved and they are under no obligations to even express disinterest. Pinging them in this manner because you want an answer is not really polite.

1 Like

I see. It's like it's not really polite to say "hello" or "help" to a stranger that seems distant and indifferent?
Really didn't know this rule before. Now that I know, I'm respecting that.
Thank you.

The alternate way of looking at that is that you're ignoring their boundaries (implicit or not) and forcing yourself on them.

It's not a rule, we do discourage it and thank you for accepting feedback on it.

Yeah.

Anyway, I am myself experimenting with this splitting indices thing. I will put the result here later when it's done.

So the answer to the scenario is: the splitting will not work. After trying to split, the new shards will be UNASSIGNED — If so, one may delete the new index.