Hey there, currently working on a project that aims to automate upgrading of an Elasticsearch cluster (a typical cluster size is ~20 nodes, the majority of which are master-eligible) to a higher minor version (eg. 7.1.1 => 7.2.0). Upgrading needs to be fully automated and cluster health can never drop below Green.. (while data is continuously being ingested). To reach these end goals, I have chosen the following upgrading strategy (the cluster is hosted on Google Compute Engine and each node is a VM instance):
Assuming a cluster with n nodes, and (n - 2) of them are master-eligible
- Deploy a group of n VM instances with newer version Elasticsearch;
- Retire a non-master (does it have to be non-master?) node from the existing cluster (might involve re-allocating shard to another running node in the cluster);
- Join a node from the newly create VM instances to the old cluster;
- Repeat, until the whole cluster is upgraded
Right now, I have a few questions regarding the above steps:
- Is there anything extra I need to care about while removing a running master node (from the docs it seems not, and I will ensure that there are always more than n / 2 + 1 master-eligible node in the cluster);
- Will the cluster health always remain in Green if data (10 - 15 GB / hour) is continuously streamed and indexed into the cluster?
- Elasticsearch are set up as system services on my VM instances. To safely remove a node (with shards allocated to it) from the cluster, is it enough to just run
systemctl stop elasticsearch? Older versions of Elasticsearch suggests that the node must be excluded from shard allocation (which causes shard already on it to be re-allocated to other nodes), but I am not sure about
Thanks! Any help is appreciated!!!