Search performance - Scaling options Horizontally vs Vertically

I have observed that elastic search defaults the search thread pool to 3 X #of CPUs and even if you increase this to a fix # it does not really help as the threads start sharing the CPU cycles.

Does this mean that to get same performance results for more concurrent searches I either have to scale vertically by adding more CPU cores or horizontally by increasing the nodes and replica's to eliviate the search times by shard to replica shards?

Here is my situation:
I have a 6 node ES cluster with 6 shards storing a total of 60 million documents (around 2kb each) and index size of around 32 GB. Each node is a VM with 4 vCPUs and 8 GB allocated to ES cluster. When I have 15 concurrent users the response is around 270 ms and I see that all 12 threads in the search pool are busy. If I increase the number of concurrent users for the search the response time keeps going higher and higher and there is more requests pending in the search queue. I even increased the thread pool configuration but it really did not help.