Rolling upgrade: How many times should I flush?

Dennis1 · January 12, 2021, 9:20pm

If I follow the steps in the Rolling Upgrade docs in the order they're written, it seems to suggest performing a flush for every node I upgrade in my cluster.

Is this really necessary, or would performing the flush once at the very beginning of the rolling upgrade probably be enough for purposes of speeding up shard recovery at the end of the whole process?

warkolm · January 12, 2021, 9:21pm

If the indices are not being written to the flush won't be a major issue. How large is your cluster?

Dennis1 · January 12, 2021, 9:23pm

My cluster is 4 nodes (3 master-eligible, and 1 kibana/coordinating-only node).

I can stop data ingestion to the cluster for the most part, but would likely (especially in the future) still have some metricbeat data trying to come in, which wouldn't be as feasible to turn off.

DavidTurner · January 12, 2021, 9:26pm

No, as the docs say, it's optional. It makes shard recovery quicker, often saving more time than it takes to do the flush after each node, but it's definitely not necessary.

Dennis1 · January 12, 2021, 9:29pm

@DavidTurner Thank you. Yes sorry I shouldn't have used "necessary" in my original question--what I was more curious about is how much doing so will influence the speed of recovery at the end?

In other words, if I flush just the first time, how much slower would recovery be than if I were to flush for every node? Would it even be noticeable? (Assuming a cluster size of <10 nodes.)

DavidTurner · January 12, 2021, 9:36pm

If you're not indexing at all then it's a wash: it won't make recovery any faster but nor will it take any time to do the flush for each node. If you carry on indexing then I'd usually expect repeated flushes to save time overall — that's why it's recommended — but the time saved is going to depend on how hard you're indexing and how your shards are allocated. There's no definite answer, sorry, you'll have to experiment.

Maybe we should turn the question around: why is it proving difficult for you to flush as recommended?

Dennis1 · January 12, 2021, 9:43pm

@DavidTurner (and @warkolm) Thank you! That's all I was basically interested in—i.e., what factors to be aware of that may make it more valuable to re-flush each time or not. It sounds like the main factor would be whether I'm still indexing or not, and the volume of indexing that's occurring over the course of the upgrade (which itself relates to the number of nodes in my cluster that need to be upgraded).

I'm still just getting ready for a rolling upgrade, but I'll probably opt to flush each time just to prepare for scaling in the future.

kkn87 · January 14, 2021, 6:37am

We do rolling upgrades automatically, using Ansible. Just added /flush task to the script before each node's update.
Also, at start, we've added changing 'index.unassigned.node_left.delayed_timeout' to 10 minutes on all indices to make recovers a bit faster after node restart.

Using this approach, rolling upgrade of our cluster (6 data nodes, 20 TB of data) takes 2 hours. It includes OS updates and nodes restarts.

Dennis1 · January 14, 2021, 4:27pm

@kkn87 Thanks for the tips and sharing your approach/experience!

I'm curious on why increasing the index.unassigned.node_left.delayed_timeout to 10 mins proved to speed up the recovery process. Are you choosing to increase that timeout parameter as opposed to disabling replica shard allocation?

kkn87 · January 15, 2021, 10:00am

@Dennis1, you suggest not to do this, because when shard allocation is disabled (swtiched to 'primaries') - changing index.unassigned.node_left.delayed_timeout is obsolete?
Maybe I misunderstood this article https://www.elastic.co/guide/en/elasticsearch/reference/current/delayed-allocation.html then.

Dennis1 · January 15, 2021, 2:16pm

@kkn87 Yes that's my understanding at least. The only difference between increasing index.unassigned.node_left.delayed_timeout versus disabling and then re-enabling cluster.routing.allocation.enable (as in steps 1 & 9 here) is that the latter gives time for the re-balancing (or "recovery") to occur when each node is brought back online after it's been upgraded. If the delay time is kept at 10 mins for the entire upgrade process, that recovery step may never occur, which I guess is not recommended especially in cases where there is a lot of indexing still happening during the upgrade.

kkn87 · January 18, 2021, 7:38am

@Dennis1, thank you for noticing this! Will remove this step from the automation.

Dennis1 · January 19, 2021, 4:49pm

@kkn87 Sure thing, just make sure to add some sort of automated check for the cluster status to return back to 'green' after upgrading each node, before moving onto the next node. The green status indicates that the recovery process is finished.

kkn87 · January 20, 2021, 7:42am

Sure, @Dennis1, we have quite comprehensive checks, waiting for completely 'green' cluster:

url: '{{ elasticsearch_url }}/_cluster/health'
...
until: >
result.status == 200
and result.json.status == 'green'
and result.json.unassigned_shards == 0
and result.json.relocating_shards == 0
and result.json.number_of_nodes == previous_result.number_of_nodes

system · February 17, 2021, 7:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.