The guidline of 20 shards per GB of heap is a maximum, not an ideal. If you have quite large shards I would expect each node to have considerably fewer shards than that.
Each query is executed ina single thread against each shard, but multiple shards can naturally be processed in parallel. This means that the minimum latency will depend on the size of the shard as well as the data and mappings used. 50GB is often used as a reasonable staring point, but the ideal shard size may be different for your use case (smaller or larger).
The number of nodes you need to hold your data will depend on query latency throughput and requirements as well as heap usage. If your data set is larger than what can be cached in RAM the performance of your disks will be important and affect latency. The more data a node holds the more I/O is typically required to serve a query.
For this tpe of use-case I would recommend running some tests and benchmarks to determine how large shards you should have and how much data each node can hold.