When I add a new node, the shard will rebalance. Will there be a delay in inserting and updating data at this time? What is the approximate time? How does this rebalance work and what is the principle?

When I add a new node, the shard will rebalance. The primary shard may move to the new Node. Will there be a delay in inserting and updating data at this time? What is the approximate time? How does this rebalance work and what is the principle?

Even though the book is already a bit older, the definitive guide still helps a lot to understand all the basics. See https://www.elastic.co/guide/en/elasticsearch/guide/2.x/scale.html

hope this helps as a start.

I could answer your questions with "It depends/It depends/Shard allocation" - but I guess the discussion is easier with a bit of basics first :slight_smile:

Thank you very much.

I read this document, but he didn't explain it in detail.
When I move the primary to a new node, how do I perform an insert, modify, or delete operation.
I have a guess, when rebalance, first copy the data to the new node, and then delete the primary shard. At this time, if there is a modification operation, it is recorded through translog and then executed on the new primary shard, but in this case, the flush operation must be closed.
I'm not sure if I guess so.

As a user, who is writing data, you never have to care if shards are being copied around right now, as long as a primary shard is available. So there is no change in your application code.

In the background the shard data is copied, while indexing data in the primary and adding it to the shard being copied as well. The translog is indeed one of the helper data structures, where new write operations are appended upon.

I do not fully understand your last statement regarding flush, maybe you can add some more context, which shard you are talking about and what it means to 'close a flush' in that context.

Thanks

Thank you for your reply.
I think that when copying the data of a primary shard to another node, it will copy the data of the disk and the data of the translog.
When a primary shard is being copied to another node, if there is an insert operation on this primary shard at this time. If there is a flush operation in this process, the data will be committed to disk and the translog will be cleared. At this time, the newly submitted data is not copied to the new node.

What makes you think that? Any changes to the data are propagated until the shards are in sync.

I am mainly curious about this specific operation process. And why this process is zero downtime.
I am worried that after adding a node, the movement of the primary shard will affect the normal application's update operation of this primary shard.

I think I might understand what you mean. May I refer to the doucment of the process of synchronizing data between shards?

Here are a few links

If relocation happens, there is a special handling of the translog. I am not sure if there is documentation besides the source about this, as this is more of an implementation detail, and if you're curious, it might make sense to check out the source.

Hope this helps!

One more link that may help despite being not fully accurate anymore:

Thank you so much.