Hello! When doing a rolling upgrade (Rolling upgrades | Elasticsearch Guide [7.3] | Elastic), is checking that there are 0 initializing and relocating shards a reliable way to determine if you can restart the next node as opposed to checking that the "status" of the cluster is "green"?
Background
I'm writing a script to do a rolling upgrade on my 7.3.2 elasticsearch cluster deployed in AWS. For me that means (at a high level):
for each ec2 instance:
1. terminate the instance
2. wait for an ASG to spin up an instance to replace the terminated one
3. wait for ES cluster to be "healthy" before terminating the next node
My question revolves around (3). The rolling upgrade documentation seems to suggest to first wait for the cluster to become "green" but if it doesn't then you can continue the rolling upgrade if there are no initializing or relocating shards. Instead of first checking that the cluster is "green" can I just check that there are no initializing or relocating shards? That would make the script logic simpler.
Excerpt from the documentation:
Before upgrading the next node, wait for the cluster to finish shard allocation. You can check progress by submitting a _cat/health request. Wait for the
status
column to switch fromyellow
togreen
. Once the node isgreen
, all primary and replica shards have been allocated.
IMPORTANT:
During a rolling upgrade, primary shards assigned to a node running the new version cannot have their replicas assigned to a node with the old version. The new version might have a different data format that is not understood by the old version. If it is not possible to assign the replica shards to another node (there is only one upgraded node in the cluster), the replica shards remain unassigned and status staysyellow
. In this case, you can proceed once there are no initializing or relocating shards (check theinit
andrelo
columns). As soon as another node is upgraded, the replicas can be assigned and the status will change togreen
.
Thanks for any and all advice!