Downsides of over-allocating shards?

One issue we recently ran into Shay. I wanted to get your feedback. In our development environment we went 100% the other way, and decreased the number of indexes, shards, and increased our tiered merging policy from 5g to 20gb to reduce segments in the system. This resulted an increase in segments size, and de-duplicated a lot of the TermInfoIndex items.

After doing this our memory looked great! We did not have to touch the TermIndex Divisor, or anything like that. As an unintended side effect, search / query time went through the roof. Looking at some stack traces, a LOT of time was being spent performing the searches themselves Example:

org.apache.lucene.index.SegmentTermPositions.lazySkip(SegmentTermPositions.java:169)

It looks like a single threaded operations (1 CPU). So my question is this. Do we increase the number of shards to increase CPU concurrency thus decreasing search time and making it more performant? For example:

2 servers. 12 CPU's each.

1 Index: 24 shards evenly balanced.
A search operation will optimistically (if allocated fair) use all 24 CPU's in the system. The cost in RAM increases quite a bit as TermInfo's are duplicated, more segments, more stuff to load into memory.

1 Index: 2 shards evenly balanced.
A search operation will use 2 CPU's in the system. RAM is reduced quite a bit, segments decrease, etc..

Is this correct?

Thanks!