Checking elasticsearch health during rolling restart

lag13 · January 14, 2020, 5:19pm

Hello! When doing a rolling upgrade (Rolling upgrades | Elasticsearch Guide [7.3] | Elastic), is checking that there are 0 initializing and relocating shards a reliable way to determine if you can restart the next node as opposed to checking that the "status" of the cluster is "green"?

Background

I'm writing a script to do a rolling upgrade on my 7.3.2 elasticsearch cluster deployed in AWS. For me that means (at a high level):

for each ec2 instance:
  1. terminate the instance
  2. wait for an ASG to spin up an instance to replace the terminated one
  3. wait for ES cluster to be "healthy" before terminating the next node

My question revolves around (3). The rolling upgrade documentation seems to suggest to first wait for the cluster to become "green" but if it doesn't then you can continue the rolling upgrade if there are no initializing or relocating shards. Instead of first checking that the cluster is "green" can I just check that there are no initializing or relocating shards? That would make the script logic simpler.

Excerpt from the documentation:

Before upgrading the next node, wait for the cluster to finish shard allocation. You can check progress by submitting a _cat/health request. Wait for the status column to switch from yellow to green . Once the node is green , all primary and replica shards have been allocated.
IMPORTANT:
During a rolling upgrade, primary shards assigned to a node running the new version cannot have their replicas assigned to a node with the old version. The new version might have a different data format that is not understood by the old version. If it is not possible to assign the replica shards to another node (there is only one upgraded node in the cluster), the replica shards remain unassigned and status stays yellow . In this case, you can proceed once there are no initializing or relocating shards (check the init and relo columns). As soon as another node is upgraded, the replicas can be assigned and the status will change to green .

Thanks for any and all advice!

rugenl · January 14, 2020, 5:35pm

One issue that can happen with the first node. If Lucene is upgraded and a new index happens to get created, I think it allocates on the highest level Lucene nodes only. In my case, it refused to allocate it's replica on a lower-level Lucene node, so the cluster never goes green. Most of our indices are 1 shard, but I wonder if one tried to create more than one shard, I think it would fail.

This happened more frequently on my test cluster with quick rollover, so far, I've never seen it happen on production-like clusters.

Also, we are now upgrading all nodes on a rack at the same time, it's quicker than a node at a time :-).

elasticforme · January 14, 2020, 5:56pm

I do same. shutdown everything and upgrade everything. done is less then 15 min top.
But then my data are not mission critical.

system · February 11, 2020, 5:56pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Rolling restart elasticsearch cluster Elasticsearch	5	1898	July 5, 2017
Wait for relocating shards Elasticsearch	8	3144	July 6, 2017
Rolling restart of a cluster? Elasticsearch	6	1367	July 6, 2017
Health Status Red(Accidentally started another node) Elasticsearch	4	383	July 6, 2017
All shards remain in unassigned state after upgrading elasticsearch Elasticsearch	2	805	July 5, 2017

Checking elasticsearch health during rolling restart

Related topics