Should we continue or stop indexing during rolling upgrades of Elastic-Search?

Hi team,

Should we continue or stop indexing during rolling upgrades of Elastic-Search?

I read in the upgrade documentation that it is optional steps to stop indexing and do flush sync. My query is if we do not stop indexing, will it take a time to rebalance entire cluster?

I have 8 data nodes and 3 master nodes with 2 TB of data. I am just confused about whether I stop indexing or not? I am upgrading from 5.6 to 6.X via a rolling upgrade.

Just wanted to know Pros and Cons.

@het
As long as all indices have at least one replica you don't need to stop indexing.

ES will make other copy of each shard on this node, primary if it's not already and continue writing to that copy. When the node comes back up, the shard copy on this node will be sync'ed to other copy. If you can stop indexing, shards will already be in sync'ed state.

Once you shutdown a node for upgrade, shards on this node with single replica will be running on a single copy. From shutdown till sync completes all search queries for these shards will be sent to a single copy. There is a possibility that these queries may run slow due to higher volume or worse you may see OOM. If you script the upgrade steps, it should only take few mins. It's not a huge risk. If you stop indexing you avoid sync time and reduce risk. In the absolute worst case scenario if you lose the disk where the only copy of a shard was residing, you may lose data. It's a very slim chance.

If you disable shard allocation as described in the documentation, replica shards will not be allocated. So rebalancing will not happen. You also need to make sure disk thresholds won't trigger.

Note the cluster will start re-allocating as soon as you stop a node, as it rebuilds the replicas, creating movement, load, etc. - so for upgrades and general reboots, we set index.unassigned.node_left.delayed_timeout = 60m
on all indexes - this means the cluster will wait an hour before allocating a new replica; giving you plenty of upgrade time.

You can see the count of shards in this status via /_cluster/health as delayed_unassigned_shards - see https://www.elastic.co/guide/en/elasticsearch/reference/current/delayed-allocation.html

Thanks, @Vinayak_Sapre.

It will help me to upgrade smoothly on the 6.X version.

Thanks @Steve_Mushero. This property will help in case of continue the indexing part.

As @Vinayak_Sapre mentions you can also disable shard allocation globally, though that may depend on how long your upgrade takes, how much new index or other activity you have going on - ideally you are doing a yum update or something similar so it's down a minute or two, but if it's way long, could be different (we are in China and some downloads or Docker things can take an hour).

I usually download binaries for ES and plugins and use offline install to avoid uncertainty during installation. K8s both AWS / GCP provide ECR / GCR respectively which are high availability.

1 Like

Totally agreed, though in a hurry, and with usual quick performance, I might have, um, perhaps, once or twice skipped that part ;), and things took a bit longer than expected (actually the real fun is Jenkins which can take hours to download, and fails often).

Also, on VMs, during upgrades the VM should also be updated and rebooted at least once before the ES upgrade; of course yum --downloadonly is your friend there, too. So we'll download everything, update & reboot, then down & update ES, then restart it, and confirm it auto starts.

In the old days we'd reboot first (after disabling FS boot checks), then update, then reboot, then update ES, then reboot - it's the right & safest way with servers on other continents; clouds make this so much easier now.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.