Hi,
When upgrading an ES 5.x cluster to a newer version, the rolling restart procedure states that you should :
- (stop indexing new data to speed up recovery)
- disable shard allocation to prevent elasticsearch from rebalancing missing shards when you are absolutely sure that node maintenance downtime will be short
- (shut down a single node / perform maintenance / restart this node)
- confirm that the restarted node has successfully joined the cluster
- reenable shard allocation
- wait for the cluster to return to a green status (usually after having rebalanced a few shards)
As this procedure is to be used when the downtime is planned to be quite short and as during this procedure the cluster turns yellow, isn't it simpler to just use delayed allocation ?
Note: this is an open question, so please note that I'm fully aware of the difference between the standard "disable shard allocation" method vs the "delayed allocation" method (as the latter could easily be achieved using the first one). Furthermore, when using IT automation tools such as Ansible, the downtime of a node is usually quite short and using the "delayed allocation" method can relieve many of having to develop their own roles and stick to the default ansible-elasticsearch role.
Best regards,
Charles.w