Migrate ElasticSearch Cluster to arm64

Hello!

We have some big clusters installed on amd64 VMs. Given arm VMs are much cheaper we are considering moving these clusters to arm64.

For example for 7.12.1 ES version, it would be possible to switch to arm64 VMs by creating the new arm vms, deploying elasticsearch arm64 compatible version on the new vms, add the new arm vms to the existing es cluster that now runs on amd64, evict the data from the amd nodes to arm nodes and than remove the amd nodes from the running cluster?

Can this be all done while the cluster is up ?

Thanks!

It should work as you describe, yes, although I do not think this is something we cover in any tests today.

1 Like

Yes, it's possible to migrate your Elasticsearch 7.12.1 cluster from amd64 to arm64 without downtime. Follow these steps:

  1. Deploy ARM64 VMs and install Elasticsearch 7.12.1 (ARM build).
  2. Join ARM nodes to the cluster (same version and cluster.name).
  3. Use shard allocation filtering to evacuate data from AMD nodes:
PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.exclude._name": "amd-node-name"
  }
}
  1. Wait for shard relocation (GET _cat/shards).
  2. Remove AMD nodes after they are empty.

:warning: Make sure all plugins are ARM-compatible and validate performance beforehand.

1 Like

If you are only changing dedicated data nodes I believe it would be this simple. When it comes to replacing master nodes I think it is a bit more complex as the voting configuration must be managed so you at any point do not lose a majority of master eligible nodes.

1 Like

The time to execute the above process, as written at least, scales with the number of data nodes in a cluster. if thats 5 nodes, not a big issue. If 50, it's a lengthy process if done one at a time. Scriptable maybe, but does introduce some risk.

Don't forget to at least check ingress and egress too.

Note Christian also referred to dedicated data nodes. Maybe you have a flavorful salad of nodes and node roles, if so it might also make things a little more complex, more pre and post checking to do.

1 Like

Hello!

Thank you for all the answers!

We have dedicated data nodes (over 30) , 3 ingest nodes and 3 dedicated master nodes. I assume that for masters we could go with the same approach, by adding one new arm master (which would be the forth master in the cluster) and remove one amd64 master node, and so on for the others as well

Thanks

That is possible. I would recommend you test the approach on a small test cluster first though in order to validate you have all steps covered. You do not want to have issues with the master nodes in a cluster that size.

I've already had a problem with this approach on master nodes, for some reason not yet identified, even putting a new one in the nodes' yaml, when removing one on the master the Cluster faced instability. I think it's a valid idea to try to test the scenario, because the error I experienced also occurred in a large Cluster like yours.

Any problem in the process will be easy to identify from the logs and other troubleshooting outputs from ES. I doubt the move to a different CPU architecture would cause anything you could cause "instability", if it doesn't work completely it'll fail outright. But there's no point in just talking about an error you experienced without sharing details of that error.