How does elastic search move shards from hot to warm/warm to hot nodes?

Hi,

I wonder, how does shard relocation happens from hot to warm nodes.
I have test cluster with 2 hot, 4 warm, 2 master,1 client node. I am using multiple indices with different shard combinations(1,2,4,8,16 shards) to check how the shards are moving from one node to another. What i've seen is, as the number of shards are getting increased, it is transferring more shards(primary or replica) to target node. It would be great if someone can answer following questions .

  1. Who controls the shard movement or who takes care of shard relocation, is it master or client ?
  2. It relocates some given number of shards at a time, it does not relocates all at once. How does elastic decide which shard to transfer and how many shards to transfer at a given time.
  3. If we relocate multiple indices from hot to warm, how does elastic decides which index to transfer first ?

Thanks,

The master.

The number of concurrent recoveries defaults to 2 incoming/outgoing (at each node) to ensure your cluster is not overloaded with recoveries. The order in which shards are relocated isn't really defined anywhere, nor is it something on which you should rely as it might change from version to version.

It isn't really defined anywhere, nor is it something on which you should rely. Why do you want to know this, out of interest?

1 Like

Thanks David for the reply. :slight_smile:

The number of concurrent recoveries defaults to 2 incoming/outgoing (at each node) to ensure your cluster is not overloaded with recoveries. The order in which shards are relocated isn't really defined anywhere, nor is it something on which you should rely as it might change from version to version.

Below is the particular snapshot of shard relocation. we could see hot-1 node is relocating more than 4 (> 2) nodes at a particular instance. does elastic controls this according to cluster health and also can we change the default concurrent recoveries (any property/conf.) ?


index_16 2 p RELOCATING 1250416 530.5mb dev-es-hot-1 -> eogqYqiTTPy11klul6ErKA dev-es-warm-3
index_16 14 r RELOCATING 1249982 529.9mb dev-es-hot-2 -> i0U_2m59Q2GA5zvqfy7QzA dev-es-warm-4
index_16 1 r RELOCATING 1248257 529.1mb dev-es-hot-1 -> Q5OPHgH7Q_SjdW2S20PVyA dev-es-warm-2
index_16 12 r RELOCATING 1247503 530.5mb dev-es-hot-1 -> eogqYqiTTPy11klul6ErKA dev-es-warm-3
index_16 12 p RELOCATING 1247503 530.5mb dev-es-hot-2 -> Q5OPHgH7Q_SjdW2S20PVyA dev-es-warm-2
index_16 4 p RELOCATING 1252275 530.8mb dev-es-hot-1 -> i0U_2m59Q2GA5zvqfy7QzA dev-es-warm-4

It isn't really defined anywhere, nor is it something on which you should rely. Why do you want to know this, out of interest?

i was just wondering if there is any algorithm/strategy which takes care of this movement.

I was going through the below blog:

According to this blog all the new indexing operations are sent to new primary node(target). If we perform any query during this relocation, how does elastic make sure it gives correct result because old primary(source) does not contain the new writes ?

In shard relocation process when the primary node completes the movement, does replica always gets copied from the primary node instead of old replica node(source) ?

It doesn't really depend on cluster health, no, but there are settings to adjust this. To be clear, if you set these much higher than the defaults then you are putting the stability of your cluster at risk.

Not really, no, Elasticsearch just keeps on moving shards until they're all in the right places.

Elasticsearch keeps on writing to the old primary until the new primary is ready to take over.

I don't really understand. The replicas are already up-to-date so there's no need to copy them from anywhere.

Elasticsearch keeps on writing to the old primary until the new primary is ready to take over.

does it mean when elastic relocates any index from hot to warm nodes, all new indexing operations sent to both primary(source and target node) ?
Quote from the blog: "In 2.x/5.x, we're better. As soon as we start relocation, primary will start to send all indexing operations to the new primary(node5)."

I don't really understand. The replicas are already up-to-date so there's no need to copy them from anywhere.

When elastic relocates any index from hot to warm node, it will first move primary node to hot to warm and then it relocates replica. My doubt was if replica shard also get copied from hot to warm node or it will be copied from the primary node ?

Yes, pretty much. The "target" copy is more like a replica than a primary for most of the relocation process, and like all other replicas it receives all indexing operations. However, this question is a bit strange because you wouldn't normally be indexing into an index that's being moved to a warm node.

I see. Yes, a new replica is always built by copying information from the current primary wherever that might be.

1 Like

Hi David,

One more doubt. How does primary and replica node get transferred ? is it parallel or sequential.
We have seen different result for different shard combinations. We are using below command to capture the time taken in shard movement.

GET /index_3/_recovery?human&detailed=true

Output of this command contains start_time and stop_time which tells the exact timing of shard movement. Below is the consolidated data for some sample indices.

index number of shards time taken per shard start_time stop_time total time for index
index_1 1 6.6~6.7m 2019-06-03T11:37:44.679Z 2019-06-03T11:44:32.222Z 0:06:48
index_2 2 2.2~3.4m 2019-06-03T11:50:04.463Z 2019-06-03T11:59:09.779Z 0:09:05
index_3 4 1.6~1.7m 2019-06-03T11:59:17.880Z 2019-06-03T12:06:15.461Z 0:06:58
index_4 8 34.3s~1.5m 2019-06-03T12:22:02.480Z 2019-06-03T12:27:13.280Z 0:05:11
index_16 16 16.4s~1.7m 2019-06-03T12:27:43.571Z 2019-06-03T12:37:52.864Z 0:10:09

If we check the start and stop time per shard, we can see some of the shards are getting relocated in parallel while some are sequential. It would be great help of you can explain this behavior.

We have cluster with 2 hot node and 4 warm node. We are relocating a index with 2 shard and 1 replica per shard(total 4 shards). After the relocation , when we check _recovery command, It shows source as hot and target as warm node for some shards while source as warm and target as warm for some.
e.g.
"source": {
"id": "eogqYqiTTPy11klul6ErKA",
"name": "dev-es-warm-3"
},
"target": {
"id": "Q5OPHgH7Q_SjdW2S20PVyA",
"name": "dev-es-warm-2"
}

"source": {
"id": "IoyUloOfR1acbnkE2Bv1-w",
"name": "dev-es-hot-1"
},
"target": {
"id": "i0U_2m59Q2GA5zvqfy7QzA",
"name": "dev-es-warm-4"
}

I am little confused about the replica shard movement. is it parallel or sequential.

It's both. Recoveries happen in parallel, up to a limit that prevents the cluster from spending too many resources on recoveries. Once the limit is reached, further recoveries have to wait for the ongoing ones to finish.

It's both. Recoveries happen in parallel, up to a limit that prevents the cluster from spending too many resources on recoveries. Once the limit is reached, further recoveries have to wait for the ongoing ones to finish.

Okay. Also one more thing, is it always guaranteed that primary shard will relocate first ? or replica node can also move from hot to warm node. ?

No, I don't think there is any such guarantee.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.