I think the master is probably the ultimate limiting factor within a single cluster. Most components scale sideways quite well, but there's only ever one elected master making decisions about the cluster, and there's a limit to how large or fast you can reasonably make that node. More clusters means more masters, each doing less work, which is good for everyone.
A pattern I've encountered a few times is where the cluster grows and grows without showing any problems, and then something happens to upset it and the master struggles to keep up with coordinating all the recovery work needed to bring things back to health. Often this is exacerbated by relaxing the defaults on certain safety mechanisms (for "scaling" reasons) without really understanding the consequences if, say, a network partition were to drop 25% of your nodes for a few minutes.
For instance, on the master there are a few places where we iterate over all the nodes and/or all the shards, which is totally reasonable when you've 50 nodes and 30k shards and pretty unreasonable when you've 500 nodes and 300k shards. We're aware of more efficient algorithms for cases like that, but usually not without making things worse for smaller clusters, and making it harder to maintain too, so we tend to avoid that complexity and instead target a limited cluster size.
30k shards isn't small by any measure, it could easily be a couple of PB.