Elasticsearch hosts upgrade - options

Hello,

We have 3 Master nodes and 5 data nodes in our cluster.

The VMs hosting the entire cluster is undergoing an upgrade which will involve downtime (for few hours) and also IP change for the 3 Master nodes and 1 data node.

We have a choice of getting this upgrade done in piecemeal and are thinking of getting the whole exercise done in 2 phases.

What would be the best option to pick when it comes to selecting the nodes?

I was thinking of the following:

1st Phase

1 Master node and 3 data nodes

2nd Phase

2 Master nodes and 2 data nodes

Please guide.

Thanks

I would say that you will probably need to do this in more steps since the downtime can take hours.

The safest option is to exclude the data node from allocation before shutting it down.

For example:

PUT _cluster/settings
{
  "persistent" : {
    "cluster.routing.allocation.exclude._name" : "node-name"
  }
}

After you run that request, the shards will start to move to the other nodes, after the node is empty you can shut it down.

When the upgrade is finished and the node is up again you need to clear the allocation settings to allow the node to receive shards again.

PUT _cluster/settings
{
  "persistent" : {
    "cluster.routing.allocation.exclude._name" : null
  }
}

When the shards are reallocated, you may repeate the process for the other data nodes.

If your master nodes are master-only, you do not need to exclude them from allocation and can just shutdown the node but you need to have at least two master nodes up at the same time.

thanks a lot! @leandrojmp

Can the "exclude data node" command take 2 nodes at a time (comma separated) as below:

PUT _cluster/settings
{
  "persistent" : {
    "cluster.routing.allocation.exclude._name" : ["node1-name", "node2-name"]
  }
}

Yes, but it is not an array, it is just comma separated.

"node1-name, node2-name"

Also, to do that you need to make sure that the remaining nodes have enough free space to receive the data from the two nodes.

thank you!

I am trying this on one node in a lower environment and it is running for almost 2.5 hours now and still ~40% shards remain to be moved over

Yeah, it is normal, it can take time.

Thanks. It took 8 hours for 1512 shards to migrate from a node that had both roles (Master and data).

Would it take the same/ similar time if I choose to do 2 hosts/ nodes together; considering i have disk space available? These 2 nodes are dedicated data nodes.

How many shards do you have in the cluster? What is the average shard size?

Hello Christian. 1465 Shards with 1.3GB size on the average

I calculated by executing the following and taking the average of the result. Hope this is the right way:

GET _cat/shards?v=true&h=index,prirep,shard,store&s=prirep,store&bytes=gb&s=store:desc

Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.