Hey there! I am working on a project that automates elasticsearch cluster upgrade (7.1.1 => 7.2.0). From reading the documentations on cluster upgrading, it seems that a reasonable approach to upgrade the entire cluster is to disable one node at a time (and re-install newer version elasticsearch package on each of them), which includes disabling shard allocation on each node (before it is disabled). Will disabling shard allocation on one node affect cluster health (Green => Yellow?) given a cluster of 20 nodes or even greater? More broadly speaking, what are the factors that determines health status of a cluster? Thanks!
Hi! The upgrade approach you describe is something that we call a "rolling upgrade," and it sounds like you've already looked at the documents on that. It's worth mentioning that shard allocation is a cluster-level setting, and not something that is enabled or disabled on a node-by-node basis. You may be posting the setting to a particular node's
_cluster/settings endpoint, but it affects the whole cluster.
Disabling shard allocation by itself will not change a cluster's health. The health would go from
YELLOW when you stop the node, and it should go back to
GREEN once the node is upgraded and back online.
There is a detailed discussion of how cluster health is determined on the Cluster Health REST API document page. The key paragraph is this one, which explains how you start with shard health and work your way up to cluster health:
The cluster health status is:
red. On the shard level, a
redstatus indicates that the specific shard is not allocated in the cluster,
yellowmeans that the primary shard is allocated but replicas are not, and
greenmeans that all shards are allocated. The index level status is controlled by the worst shard status. The cluster status is controlled by the worst index status.
Here is one way to think of how shard allocation settings relate to cluster health. A cluster will probably not be able to recover from a
RED state on its own. Someone will have to fix it. However, given time, a properly configured cluster should be able to go from
GREEN; it does this by reallocating shards. But when shard reallocation is disabled, the cluster will not be able to repair itself by recreating missing replica shards. When you are upgrading, this is fine, because you know that the node is only temporarily offline. But normally you want the cluster to rebalance itself when shards go missing, as this will help protect you from data loss.
I hope this answer is helpful!
Hi William, thanks for your answer! This is actually a lot more helpful than the docs!