I suppose my concern is:
-The cluster is built with 10 nodes and elasticsearch.ylm originally populated as such (assuming we sufficient replicas, we can allow 2 nodes to fail)
-Now assume we add 10 additional nodes. As long as everything is running as intended, there are no issues. If however say 5 nodes all go down (poor planning, they're all on the same VM host that was taken down), the cluster would still have 15 total nodes which is more than enough to kick off the recovery logic. But what if that's not the intention - in this 20 node cluster, we intend the recover_after_nodes to be 18 and the expected_nodes to be 20 - there doesn't seem to be a way to apply this without a cluster restart.
-But let's also go the other direction. We have a 20 node cluster with the intended recovery settings.
-If our workload decreases to where we think we can get the same job done with 10 nodes, how would we adjust the recovery settings going the other direction? Once the cluster has only 10 nodes, we'll never meet the required 18 or 20 nodes the recovery settings require.
I suppose it feels odd for this to be a node level setting - my gut tells me this is better suited at the master level but that doesn't appear to be the case.
"The following static settings, which must be set on every data node in the cluster, controls how long nodes should wait before they try to recover any shards which are stored locally"