Shard relocation during blue green deployment

Jathin · July 18, 2019, 9:35pm

i am currently setting up automation for a blue green deployment for my elasticsearch data nodes.

nodes are launched with node.attr.stack: stack_id

below cluster setting is added so that when green nodes are spun up and join the cluster it does not automatically relocate shards.

cluster.routing.allocation.include.stack: blue_stack_id

I am trying to create a script to reroute shards using the reroute api and then update the cluster settings

cluster.routing.allocation.include.stack: green_stack_id

so how do i disable auto relocation but allow rerouting via api calls?

Jathin · July 19, 2019, 12:28am

I tried to disable allocation and then added both blue and green stack id for hot and warm nodes to routing.allocation.include setting.

        "cluster": {
        "routing": {
            "allocation": {
                "include": {
                    "warm_stack": "warmdata-v7-2-x-201907171912,warmdata-v7-2-x-201907171912",
                    "hot_stack": "hotdata-v7-2-x-201907181444, hotdata-v7-2-x-201907171910"
                },
                "enable": "none",
                "exclude": {
                    "_name": "rd-mas2-cl-elastic-*"
                }
            }
        }
    }

response from _cat/nodeattrs

rd-mas2-cl-es-hotdata-0e37100399e463765  aws_availability_zone us-east-1b
rd-mas2-cl-es-hotdata-0e5865ad259216f05  aws_availability_zone us-east-1b
rd-mas2-cl-es-warmdata-0033915e8315246c7 aws_availability_zone us-east-1b
rd-mas2-cl-es-warmdata-0fcdcc27f7f6c4937 aws_availability_zone us-east-1b
rd-mas2-cl-es-hotdata-04f19adba450d1d28  hot_stack             hotdata-v7-2-x-201907171910
rd-mas2-cl-es-hotdata-0e37100399e463765  hot_stack             hotdata-v7-2-x-201907171910
rd-mas2-cl-es-hotdata-0095d2196ebefa95c  hot_stack             hotdata-v7-2-x-201907181444
rd-mas2-cl-es-hotdata-0e5865ad259216f05  hot_stack             hotdata-v7-2-x-201907181444
rd-mas2-cl-es-warmdata-0033915e8315246c7 warm_stack            warmdata-v7-2-x-201907171912
rd-mas2-cl-es-warmdata-0f5e59edee811d0f0 warm_stack            warmdata-v7-2-x-201907171912
rd-mas2-cl-es-warmdata-09e92d1732b1f5bdb warm_stack            warmdata-v7-2-x-201907181824
rd-mas2-cl-es-warmdata-0fcdcc27f7f6c4937 warm_stack            warmdata-v7-2-x-201907181824

But i am still getting these kind of errors.

{"error":{"root_cause":[{"type":"remote_transport_exception",
"reason":"[rd-mas2-cl-elastic-0b537725bd926dd48][IP_REDACTED.225:9300][cluster:admin/reroute]"}],
"type":"illegal_argument_exception","reason":"[move_allocation] can't move 0, 
from {rd-mas2-cl-es-warmdata-0033915e8315246c7}{ghEEKDZfS_qoo1E3rNHrxg}{PF4vaf7eRnO9y1pLBWDWAQ}{IP_REDACTED.199}{IP_REDACTED.199:9300}{aws_availability_zone=us-east-1b, warm_stack=warmdata-v7-2-x-201907171912, rack=cl-es-warmdata, xpack.installed=true}, 
to {rd-mas2-cl-es-warmdata-0fcdcc27f7f6c4937}{eVl6f7XURVSHKzoBJbUnKQ}{-jrlOOpGTRKbWH33lljgIg}{IP_REDACTED.213}{IP_REDACTED.213:9300}{aws_availability_zone=us-east-1b, warm_stack=warmdata-v7-2-x-201907181824, rack=cl-es-warmdata, xpack.installed=true},
since its not allowed, reason: 
[YES(shard has no previous failures)]
[YES(primary shard for this replica is already active)]
[YES(explicitly ignoring any disabling of allocation due to manual allocation commands via the reroute API)]
[YES(can allocate replica shard to a node with version [7.2.0] since this is equal-or-newer than the primary version [7.2.0])]
[YES(the shard is not being snapshotted)]
[YES(ignored as shard is not being recovered from a snapshot)]
[NO(node does not cluster setting [cluster.routing.allocation.include] filters [warm_stack:\"warmdata-v7-2-x-201907171912 OR warmdata-v7-2-x-201907171912\",hot_stack:\"hotdata-v7-2-x-201907181444 OR hotdata-v7-2-x-201907171910\"])]
[YES(the shard does not exist on the same node)]
[YES(enough disk for shard on node, free: [193.1gb], shard size: [1.2mb], free after allocating shard: [193.1gb])]
[YES(below shard recovery limit of outgoing: [1 < 2] incoming: [0 < 2])]
[YES(total shard limits are disabled: [index: -1, cluster: -1] <= 0)]
[YES(node meets all awareness attribute requirements)]"}

DavidTurner · July 19, 2019, 7:06am

Your move command is asking Elasticsearch to move a shard somewhere that it's not allowed to be according to the filters quoted in the message. But there's a more fundamental question: why not let Elasticsearch do this for you? If you set the filters correctly then Elasticsearch would automatically move these shards.

Jathin · July 19, 2019, 6:55pm

i was told that the elasticsearch may just randomly relocate shards, and when cluster size in 25-50TB range then there is a large aws data transfer cost when data is moved between multiple availability zones.

I am trying to figure out the availability zone of blue and green nodes and so that i can reroute within availability zones and reduce data transfer cost. i am now finding that this may not really solve it as cluster could copy data from primary or anyone of the replica shards.

Can elasticsearch relocate between same availability zone where ever possible? as data transfer cost is zero when it is within availability zone.

DavidTurner · July 19, 2019, 11:43pm

Elasticsearch does not copy data from replica shards. Replicas are always built by copying data from the primary.

Blue/green deployment is a nice idea for little clusters but I would say it's pretty inappropriate for one with 25-50TB of data. Every new deployment will involve copying the entire dataset, and then copying it again (since you must relocate both primary and replica onto the new nodes). At the kind of scale you're talking about you will be better off keeping the data as stationary as possible.

Jathin · July 22, 2019, 1:21pm

This would most likely be once in 2 months for OS patches and/or es upgrades. We would iterate in lower envs with lot less data and then one big push in higher envs.

Jathin · July 22, 2019, 4:36pm

question: if node-a has primary shard and node-b has a replica shard and a reroute request to moave replica shard from node-b to node-c is issued. then elasticsearch can copy from node-b to node-c or node-a to node-c. And if ec2-plugin is installed on all nodes, node-b to node-c data transfer one would be in same az and hence no cost, but node-a to node-c would be on different az and hence aws will charge data transfer cost.

so can we configure elasticsearch cluster to make sure during relocation elasticsearch tries to transfer within same az and only when it can't do that then it would transfer over different az.

DavidTurner · July 22, 2019, 8:40pm

Repeating my reply from above:

Elasticsearch doesn't really move the replica; it builds a new replica (by copying it from the primary) and then deletes the old replica.

Jathin · July 22, 2019, 8:55pm

ok.. i get it now.. thanks for re-phrasing it.

So if there are no write operations happening, it would be better to copy within same az? can that be accommodated?

DavidTurner · July 22, 2019, 10:21pm

It's not something I've heard requested before. I suspect that's because this kind of process is rather unusual when you have as much data as you do: it's more usual to try and keep the data more stationary. I don't think there's a way to do it in any version of Elasticsearch available today; it sounds technically feasible but would involve some fairly fundamental changes to achieve.

system · August 19, 2019, 10:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fundamental question about ES data/shards Elasticsearch	3	431	July 6, 2017
Rolling restart elasticsearch cluster Elasticsearch	5	1871	July 5, 2017
Reroute API Elasticsearch	21	936	November 28, 2022
How does elastic search move shards from hot to warm/warm to hot nodes? Elasticsearch	10	2789	July 1, 2019
Migrating to a new cluster? Elasticsearch	2	914	July 5, 2017

Shard relocation during blue green deployment

Related topics