Fully automated upgrading of Elasticsearch cluster

Hey there, currently working on a project that aims to automate upgrading of an Elasticsearch cluster (a typical cluster size is ~20 nodes, the majority of which are master-eligible) to a higher minor version (eg. 7.1.1 => 7.2.0). Upgrading needs to be fully automated and cluster health can never drop below Green.. (while data is continuously being ingested). To reach these end goals, I have chosen the following upgrading strategy (the cluster is hosted on Google Compute Engine and each node is a VM instance):

Assuming a cluster with n nodes, and (n - 2) of them are master-eligible

  1. Deploy a group of n VM instances with newer version Elasticsearch;
  2. Retire a non-master (does it have to be non-master?) node from the existing cluster (might involve re-allocating shard to another running node in the cluster);
  3. Join a node from the newly create VM instances to the old cluster;
  4. Repeat, until the whole cluster is upgraded

Right now, I have a few questions regarding the above steps:

  1. Is there anything extra I need to care about while removing a running master node (from the docs it seems not, and I will ensure that there are always more than n / 2 + 1 master-eligible node in the cluster);
  2. Will the cluster health always remain in Green if data (10 - 15 GB / hour) is continuously streamed and indexed into the cluster?
  3. Elasticsearch are set up as system services on my VM instances. To safely remove a node (with shards allocated to it) from the cluster, is it enough to just run systemctl stop elasticsearch? Older versions of Elasticsearch suggests that the node must be excluded from shard allocation (which causes shard already on it to be re-allocated to other nodes), but I am not sure about Elasticsearch 7.x..

Thanks! Any help is appreciated!!!

That's a strange choice. If a typical cluster size is ~20 nodes then you are saying you typically have ~18 master-eligible ones. You'd normally only need 3 master-eligible nodes. The more master-eligible nodes you have the more conflicts you will see at election time.

If you stop a node while it still has shards allocated to it then the cluster health will no longer be green since the shards on that node will be unassigned for a while. Since you require green health at all times then you must vacate all the shards off each node with allocation filtering first. It's not clear why you want green health at all times, by the way, it's normally considered acceptable (and much cheaper) to allow it to drop to yellow during maintenance.

You'd find it much quicker to add all the new nodes to the cluster at once, vacate all the old nodes at once, and then shut the old nodes down. The only tricky bit with that is, as you observe, being sure not to shut the master eligible nodes down too quickly. You can remove the old master-eligible nodes one-at-a-time if you want, but you also have to build in an appropriate wait (i.e. something like polling the cluster state API) and it's normally simpler to use the voting configuration exclusions API since that already does the waiting for you and lets you shut down all the old masters at once.

I should point out that Elasticsearch is really better suited to be upgraded in-place rather than migrating all the data like this. On GCE you will find that the cross-zone data transfer costs ($0.01/GB at time of writing) start to add up quite quickly when migrating your entire dataset onto new nodes. If every primary has one replica in a different zone for resilience then a whole-cluster migration may involve copying your entire dataset between zones twice.

Hello David, thanks again for your reply! By saying (n - 2) nodes are master-eligible I meant that I set almost all nodes' node.master to true before the cluster is first-time bootstrapped. But just as you said, setting a high number of master-eligible nodes might cause more conflict during election, so I guess I will just set a smaller number (~3) of nodes as master-eligible when a new cluster is deployed.

In reading the API docs you referenced, it seemed that maybe I can change the workflow of upgrading a cluster to the following (since the nodes' machine images are coming from an IaC tool, in-place upgrade is not possible..):

  1. Deploy a group of n VM instance with newer version Elasticsearch, and set 3 of them as master-eligible (so they can elect a master efficiently after the whole cluster is upgraded);

  2. Join the entire newly created node group with the existing cluster;

  3. Remove old nodes from the cluster one at a time. Call allocation filtering API to exclude the node from shard allocation before it is removed to make sure there is no data loss, and:

    i. if the node to be removed is master-eligible, exclude it from voting configuration before it is removed;

    ii. otherwise, simply do systemctl stop elasticsearch on the node to retire it from cluster (the VM instance will be destroyed later)

How does this upgrading strategy look like? Again, thanks for replying to this question and the one before it!

Sounds good. You may also want to consider using dedicated master nodes. Mixed data/master nodes are ok if a cluster is small or lightly-loaded, but at scale it tends to be useful to keep the roles separate. From the docs:

Indexing and searching your data is CPU-, memory-, and I/O-intensive work which can put pressure on a node’s resources. To ensure that your master node is stable and not under pressure, it is a good idea in a bigger cluster to split the roles between dedicated master-eligible nodes and dedicated data nodes.

While master nodes can also behave as coordinating nodes and route search and indexing requests from clients to data nodes, it is better not to use dedicated master nodes for this purpose. It is important for the stability of the cluster that master-eligible nodes do as little work as possible.

Regarding this:

Why do this one at a time? It would be quicker to exclude all the old nodes at once, since the shards can move in parallel that way. Then once the shards have all finished moving and the voting config exclusions are in place you can just shut everything down in one step.