Regarding elasticsearch rolling upgrades

Hi,

We are trying to upgrade our elasticsearch cluster from 7.17 to 8.X and we will have to do it in a rolling fashion.

I found the below documentation thats for 7.x version.

  1. Do we not have similar doc for 8.x or is this still valid for upgrade to 8.x?
  2. As per this doc, we disable shard allocation before stopping a node. So if some new data gets indexed on to this node before we stop it for upgrade - doesn't this create a primary down situation/RED state when we stop the node?

Hi,

Regarding your question about shard allocation, when you disable shard allocation before stopping a node, Elasticsearch will not allocate any new shards to that node, and it will not relocate any shards away from that node. This means that if new data gets indexed into the cluster while shard allocation is disabled, the new shards will be allocated to the other nodes in the cluster, not the one you're about to stop.

Regards

Doesnt look like this is the case. As per documentation "cluster.routing.allocation.enable": "primaries" means 'Allows shard allocation only for primary shards'. This is a cluster wide setting and does not have node specific configurations. Thus there is still possibility of new primaries getting created at the target node.
Let me know if Im mistaking something here. Thanks for the response.

Hi,

if you have the necessary storage you can:

Evacuate the node: Use Elasticsearch's Cluster Update Settings API to set cluster.routing.allocation.exclude._ip or cluster.routing.allocation.exclude._name to the IP or name of the node you want to evacuate. This will start the process of relocating shards from that node to other nodes in the cluster.

Wait for the node to be evacuated : You can monitor the progress by checking the relocating_shards field in the _cluster/health API

Stop the node

Upgrade and restart the node

Re-include the node

PUT /_cluster/settings
{
"transient": {
"cluster.routing.allocation.exclude._ip": null
}
}

Regards

I see. However, this is not something thats recommended by elastic to do during an upgrade and its not mentioned as part of the docs they have. One of the problems I see with it is that it creates unnecessary IO - which is something we are recommended to avoid during an upgrade scenario.
Just wondering how the mentioned RED index situation is avoided as per the official rolling upgrade doc.

I don't think that you can avoid the scenario you described, by setting allocation to just primaries and a new index being created on the exact moment when the node is turned-off.

However, this seems to be an edge case, I'm also not sure this would happen or how to replicate this scenario, there are too many things that would need to happen on the same time for this to lead to a RED cluster.

But only someone from Elastic which more in deep knowledge of how things work internally can give a definitive answer.

In my opinion you should not worry much about this happening.

You can also follow the optional recommendation of turning off ingestion durante the rolling upgrade process if you want.

You also do not need to exclude a node from allocation for the olling upgrade process, this is not recommended since it may take a long time to empty each node and them fill them up again.