As I understand it, ES's query speed is a function of the term dictionary size, not the document count.
In my scenario, I have a large set of data that can be distributed across any arbitrary number of shards. The terms would be basically identical for each shard. The only variable would, of course, be which documents match which term.
As I see it, the more shards I have, the more duplicate term dictionaries I'll have. More disk, more RAM usage. Assuming I have 8 x 16 shards, I'll hit the vCPU limit (8) and cluster node limit (16), thus achieving maximum parallelism.
Is my logic correct? Or is there an advantage I'm missing to having more, smaller shard (document-count-wise) but with duplicate term dictionaries in each?