Cluster Rerouting - Moving Primaries

Eduardo_Iglesias · April 16, 2021, 1:38pm

Hello Community,

I'm trying to move my primary from Node-C to Node-D but I'm not able to perform this task. Is there any way to accomplish this?

PUT rr_index2
{
  "settings": {
    "number_of_shards": 1,
  "number_of_replicas": 5
  }
}

rr_index2 0 r STARTED 0 230b 192.168.1.101 Node-A
rr_index2 0 r STARTED 0 230b 192.168.1.102 Node-B
rr_index2 0 p STARTED 0 230b 192.168.1.103 Node-C
rr_index2 0 r STARTED 0 230b 192.168.1.104 Node-D
rr_index2 0 r STARTED 0 230b 192.168.1.105 Node-E
rr_index2 0 r STARTED 0 230b 192.168.1.106 Node-F

Attempt to reroute from C to D

POST /_cluster/reroute
{
  "commands": [
    {
      "move": {
        "index": "rr_index2", "shard": 0,
        "from_node": "Node-C", "to_node": "Node-D"
      }
    }
  ]
}

Result

{
  "error": {
    "root_cause": [
      {
        "type": "remote_transport_exception",
        "reason": "[Node-C][192.168.1.103:9300][cluster:admin/reroute]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "[move_allocation] can't move 0, from {Node-C}{_3Yavze-RHmK-FJo1_Tk2w}{XamQ0Q9LQEee14XUmSHpgg}{192.168.1.103}{192.168.1.103:9300}{ml.machine_memory=1915801600, xpack.installed=true, ml.max_open_jobs=20, ilm=hot, valhalla=alfheim}, to {Node-D}{YC_fql0yRfGONLFYmGdvMQ}{dEJQwWaBRCSf3-puI5J75Q}{192.168.1.104}{192.168.1.104:9300}{ml.machine_memory=1362153472, ml.max_open_jobs=20, xpack.installed=true, ilm=warm, valhalla=alfheim}, since its not allowed, reason: 
[YES(shard has no previous failures)]
[YES(shard is primary and can be allocated)]
[YES(explicitly ignoring any disabling of allocation due to manual allocation commands via the reroute API)]
[YES(can relocate primary shard from a node with version [7.2.1] to a node with equal-or-newer version [7.2.1])]
[YES(no snapshots are currently running)]
[YES(ignored as shard is not being recovered from a snapshot)]
[YES(node passes include/exclude/require filters)]
[NO(the shard cannot be allocated to the same node on which a copy of the shard already exists [[rr_index2][0], node[YC_fql0yRfGONLFYmGdvMQ], [R], s[STARTED], a[id=dyQH6U5JROqVzVvwnf0pwg]])]
[YES(enough disk for shard on node, free: [1.6gb], shard size: [230b], free after allocating shard: [1.6gb])]
[YES(below shard recovery limit of outgoing: [0 < 2] incoming: [0 < 2])][YES(total shard limits are disabled: [index: -1, cluster: -1] <= 0)][YES(node meets all awareness attribute requirements)]"
  },
  "status": 400
}

Additionally, if I want to setup an index with 2 shards and 5 replicas how can I make the primary to live in the same node? Is there any way to accomplish this as well?

Lets say that I want my primaries to live in node C

rr_index3 0 r STARTED 0 283b 192.168.1.101 Node-A
rr_index3 0 r STARTED 0 283b 192.168.1.102 Node-B
rr_index3 0 p STARTED 0 283b 192.168.1.103 Node-C
rr_index3 0 r STARTED 0 283b 192.168.1.104 Node-D
rr_index3 0 r STARTED 0 283b 192.168.1.105 Node-E
rr_index3 0 r STARTED 0 283b 192.168.1.106 Node-F

rr_index3 1 r STARTED 0 283b 192.168.1.101 Node-A
rr_index3 1 r STARTED 0 283b 192.168.1.102 Node-B
rr_index3 1 r STARTED 0 283b 192.168.1.103 Node-C
rr_index3 1 p STARTED 0 283b 192.168.1.104 Node-D
rr_index3 1 r STARTED 0 283b 192.168.1.105 Node-E
rr_index3 1 r STARTED 0 283b 192.168.1.106 Node-F

I noticed that If I setup an index with 7 shards and 5 replicas there is a node that will be holding 2 primaries without the cluster having any issues, but if I try to manually move a primary to a different node the operation fails.

In the bellow example Node-C is holding the primary for shards 0 & 6

PUT rr_index4
{
  "settings": {
    "number_of_shards": 7,
  "number_of_replicas": 5
  }
}

rr_index4 0 r STARTED 0 230b 192.168.1.101 Node-A
rr_index4 0 r STARTED 0 230b 192.168.1.102 Node-B
rr_index4 0 p STARTED 0 230b 192.168.1.103 Node-C
rr_index4 0 r STARTED 0   0b 192.168.1.104 Node-D
rr_index4 0 r STARTED 0 230b 192.168.1.105 Node-E
rr_index4 0 r STARTED 0   0b 192.168.1.106 Node-F

rr_index4 1 r STARTED 0 230b 192.168.1.101 Node-A
rr_index4 1 r STARTED 0   0b 192.168.1.102 Node-B
rr_index4 1 r STARTED 0 230b 192.168.1.103 Node-C
rr_index4 1 p STARTED 0 230b 192.168.1.104 Node-D
rr_index4 1 r STARTED 0   0b 192.168.1.105 Node-E
rr_index4 1 r STARTED 0 230b 192.168.1.106 Node-F

rr_index4 2 r STARTED 0 230b 192.168.1.101 Node-A
rr_index4 2 r STARTED 0 230b 192.168.1.102 Node-B
rr_index4 2 r STARTED 0 230b 192.168.1.103 Node-C
rr_index4 2 r STARTED 0 230b 192.168.1.104 Node-D
rr_index4 2 r STARTED 0 230b 192.168.1.105 Node-E
rr_index4 2 p STARTED 0 230b 192.168.1.106 Node-F

rr_index4 3 r STARTED 0 230b 192.168.1.101 Node-A
rr_index4 3 p STARTED 0 230b 192.168.1.102 Node-B
rr_index4 3 r STARTED 0 230b 192.168.1.103 Node-C
rr_index4 3 r STARTED 0 230b 192.168.1.104 Node-D
rr_index4 3 r STARTED 0   0b 192.168.1.105 Node-E
rr_index4 3 r STARTED 0 230b 192.168.1.106 Node-F

rr_index4 4 p STARTED 0 230b 192.168.1.101 Node-A
rr_index4 4 r STARTED 0 230b 192.168.1.102 Node-B
rr_index4 4 r STARTED 0 230b 192.168.1.103 Node-C
rr_index4 4 r STARTED 0 230b 192.168.1.104 Node-D
rr_index4 4 r STARTED 0 230b 192.168.1.105 Node-E
rr_index4 4 r STARTED 0 230b 192.168.1.106 Node-F

rr_index4 5 r STARTED 0   0b 192.168.1.101 Node-A
rr_index4 5 r STARTED 0 230b 192.168.1.102 Node-B
rr_index4 5 r STARTED 0 230b 192.168.1.103 Node-C
rr_index4 5 r STARTED 0   0b 192.168.1.104 Node-D
rr_index4 5 p STARTED 0 230b 192.168.1.105 Node-E
rr_index4 5 r STARTED 0   0b 192.168.1.106 Node-F

rr_index4 6 r STARTED 0 230b 192.168.1.101 Node-A
rr_index4 6 r STARTED 0 230b 192.168.1.102 Node-B
rr_index4 6 p STARTED 0   0b 192.168.1.103 Node-C
rr_index4 6 r STARTED 0 230b 192.168.1.104 Node-D
rr_index4 6 r STARTED 0 230b 192.168.1.105 Node-E
rr_index4 6 r STARTED 0 230b 192.168.1.106 Node-F

leandrojmp · April 17, 2021, 4:32am

You can't have a primary shard and the replica of the same shard stored in the same node.

In your first case your index has 1 primary shard and 5 replicas, so in total you have 6 shards (1p +5r), since you have 6 nodes, you won't be able to move the shard and this is returned in the error.

NO(the shard cannot be allocated to the same node on which a copy of the shard already exists [[rr_index2][0], node[YC_fql0yRfGONLFYmGdvMQ], [R], s[STARTED], a[id=dyQH6U5JROqVzVvwnf0pwg]])

On the second case your index has 7 primary shards and each one has 5 replicas, in total you will have 42 shards (7p + 35r) and since you have only 6 nodes, one of your nodes would have two primary shards, this is normal as elasticsearch tries to balance the shards and it is not a problem.

You can't move a primary shard in this situation because a replica of this same shard already exists in one of the other nodes.

Any particular reason to move the shard?

system · May 15, 2021, 4:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shard reroute and exchange Primary and Replica on a node Elasticsearch	2	551	September 17, 2022
How does elasticsearch move a primary shard? Elasticsearch	10	5325	January 18, 2019
Force reroute of a primary to a particular node Elasticsearch	4	1862	January 30, 2020
Force shard reallocation Elasticsearch	8	8827	July 5, 2017
Way to route primary shards back to other nodes in case of data node failure in cluster Elasticsearch	3	578	July 6, 2017

Cluster Rerouting - Moving Primaries

Related topics