I know the default in Logstash is to create a new index daily. I've also seen in different places people recommend not running more than a couple shards per node, because each shard is a Lucene instance under the surface.
Theoretically they're perhaps right, but having big indexes with a larger number of shards that span over longer periods of time is inflexible from a scaling perspective (you can't change the number of shards in an index without reindexing) and you risk ending up with really big shards. To avoid excessive recovery times you'll want to keep your shards up to a few tens of gigabytes.
They could also have meant no more than a couple of shards per index but having multiple indexes. A node can easily host hundreds of shards.
If you're creating a new index daily, my understanding is you're creating a couple shards daily.
What am I missing? It doesn't seem possible that people are adding new nodes for every new day's worth of data.
No, of course not. Daily indexes are fine, but you may want to reduce the number of shards per index from the default of five. Each shard carries a constant cost so you don't want too many of them, but you also want data to be distributed across nodes. What's optimal depends on how many nodes you have, how many documents per day, the size of the documents, how many days' worth of indexes you're going to keep online, what type of queries are made, how frequently queries span over multiple days ... you get the idea.