How does elasticsearch move a primary shard?

Attila_Nagy · December 21, 2018, 8:46am

Hi,

I wonder, what is the exact process of moving a primary shard to another node? I'm not interested in how to do this with the HTTP API but what happens when the user or elasticsearch decides to do this.
How does the recover and handover happens, can it accept and perform all operations during the process, and how doesn't they get lost or directed to the wrong primary, etc.
Is it documented somewhere in detail?
If not, could somebody with enough knowledge do this?

Thanks,

dadoonet · December 21, 2018, 9:08am

A primary shard does not really move to another node as per say.
When you have a primary shard on node1 and its replica on node2, when node1 stops, the replica on node2 is automatically marked as the primary.

In a sense it moved to node2 but not really physically I'd say.

If you restart node1, then the shard (which was primary) is started as a replica and recovers from the new primary (from node2).

Makes sense?

Attila_Nagy · December 21, 2018, 9:23am

I can move a primary shard to another node with _cluster/reroute.
What I would like to know:

how does the handover happens exactly?
what happens with the queries during the process (the recover of the shard can take a lot of time, during that new data arrives into the primary, which needs to be synced etc)?

It would be nice to have this explained in detail in order to undestand its consequences.

Thanks,

dadoonet · December 21, 2018, 9:36am

The main question is why would you want to do that...

Anyway here are some high level explanations.
It creates the shard on the other node (let says node2), it copies segments from node 1 to node 2. It applies missing operations from the transaction log.
Once both shards are in sync, it promotes the shard on node 2 as the primary and removes the copy on node 1.

Attila_Nagy · December 21, 2018, 9:59am

It's great that elasticsearch has the ability to move shards manually!
I use this feature to achieve a better shard allocation setup, because the built-in isn't really smart. It will happily put very high traffic shards onto the same nodes while placing low traffic ones to others, resulting a high amount of imbalance between the cluster nodes.
Because I know the metrics, I can move them more appropriately.

But my real question is: on a high traffic shard there could be a state where the sync can't be reached. What happens then? Does elasticsearch forcibly move the shard and lose inflight operations?
Or does it pause new operations while a last sync happens?

dadoonet · December 21, 2018, 11:15am

A replica and a primary are basically getting the same load.

If you want to route shards to some specific servers, I'd not use manual shard allocation but shard allocation filtering.

But my real question is: on a high traffic shard there could be a state where the sync can't be reached. What happens then? Does elasticsearch forcibly move the shard and lose inflight operations?

If a shard can not be reached, then probably the inject operation will be rejected.
If the node can not be reached then elasticsearch after some time will decide to reallocate replicas somewhere else in the cluster.

But may be I did not understand the question correctly. @Christian_Dahlqvist might know.

Attila_Nagy · December 21, 2018, 11:43am

Shard allocation filter isn't really good for it. I want to spread indices over all machines, just want to do it evenly, so if I have a very frequently used index, I want to have its shards evenly distributed.
Elasticsearch itself doesn't take this into account, that's why I'm doing this manually. But it's working fine...

No, the shard is available. My question is what happens during the handover initiated with a cluster reroute. Will queries get lost? Will elasticsearch reject some of them?
I would like to read about this in a detailed manner if possible, just to understand how this works and what to expect.
It would be good to have a doc page somewhere with this.

DavidTurner · December 21, 2018, 11:58am

I think you misunderstand, the primary doesn't "move" like this. It works by taking an existing replica and promoting it to primary, as mentioned earlier:

Since replicas hold all the same data as primaries anyway, no "last sync" is required. There is a short period of time during which operations might be routed to the old primary after it has been demoted, but there is a mechanism to reroute them to the new primary if this occurs.

DavidTurner · December 21, 2018, 12:05pm

You may wish to investigate the index.routing.allocation.total_shards_per_node setting for this. This applies a hard limit to the number of shards per node of an index, which may interfere with the way that shards are allocated and lead to unassigned replicas, but when used sparingly it might help.

You may also be interested in a plan to improve this more generally. Your thoughts (or just a +1) on that issue would be welcome.

dadoonet · December 21, 2018, 12:12pm

You can also have one index with one single shard. And allocate the index wherever you want.

system · January 18, 2019, 12:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster Rerouting - Moving Primaries Elasticsearch	2	438	May 15, 2021
moveToPrimary() Elasticsearch	2	194	July 6, 2017
Way to route primary shards back to other nodes in case of data node failure in cluster Elasticsearch	3	578	July 6, 2017
Reroute API question Elasticsearch	3	321	July 6, 2017
Question About Allocation of Primary and Replica Shards After Reboot Elasticsearch	6	444	July 6, 2017

How does elasticsearch move a primary shard?

Related topics