Thank you for responding. I understand your point about recovery always copying from the primary, and I will focus on 7.9. Before I upload data, I'd like to make terms like "slow" and "pretty fast" more objective.
On our largest 7.9.1 system, two indices are created each day. Each index ingests about 10GB/hr, 240GB/day; there is 1 replica, so 480GB storage/index/day; 2 indices/day, so pushing 1TB storage/day. This goes into 40-50 shards/day (we've been tuning) so about 20GB/shard. We keep indices open for about 14 days, close them, and then keep the closed indices on line for about 2 more weeks.
There are about 50 data nodes. With about 50 shards/day, 50 data nodes, and 30 days retention, every data node that leaves takes out about 30 shards, about half assigned to open indices on average and the other half to closed.
The ES data nodes are Kubernetes pods. They leave the cluster periodically because we drain their K8s worker node for operational reasons. In a recent example, an ES data node (pod) leaving resulted in about 15 yellow open indices and 14 yellow closed indices. The data pod rescheduled, reconnected to its storage, and rejoined the ES cluster before the node_left timeout expired.
It took 2+ hours to recover the open indices. Then it then took 2 hours to recover the 14 yellow closed indices, an average of about 8 minutes each. Some closed indices recovered in less than 1 minute, while others took 10-15 minutes.
Is 8 minutes/index for closed indices what you would call "pretty fast" recovery?