TL;DR - improving your storage I/O bandwidth will result in improved search performance, and I'm not specifically encouraging you to have multiple shards per index per node.
Again, this is just flatly false. Multiple shards of the same index (or different indices, it matters not) that are located on the same node can receive search requests concurrently and fulfill those requests independently to the extent that the OS can make the physical resources available to perform them (CPU cores and storage read capacity).
So while it is true that the more shards on a node are managing tasks at the same time, the more likely it is they will experience contention for those resources, the quoted statement is either accidently poorly phrased or just demonstrates a lack of understanding.
The first cited reference states the situation much more accurately, alluding to the contention for disk reads. What's omitted is the fact that each shard has memory-based activity that can be performed before requiring any read, and with sufficient CPUs, those things can be done in parallel by shards. Now, the results from all participating shards must be reduced to a single response, so there's some fuzzy area about the tradeoff between the incremental gain in (say) searching 10 shards of 1GB each vs. 1 shard of 10GB and the reduce overhead difference, but that same comparison is relevant if the shards are on 10 separate nodes or on one node with really great compute and storage resources.
However
Realistically, you need to think about your total request load (indexing and searching) when considering sharding.
If you expect to have 100 searches per second on that index, now, with 10 shards on the same node, that's 1000 shard searches per second contending for resources instead of 100 shard searches per second, and that's one reason that the prevailing advice is to have fewer shards. Then, to the extent that you are willing and able to allocate additional servers, you can reduce search latency by increasing the number of replicas per shard. If you have 4 of those nodes instead of 1, each with its own copy of of each shard, the load per node would become (roughly, there's some randomization) 25 shard searches per second.
What absolutely never works, though, is when someone with a "fixed" architecture is unhappy with their search performance, and they read it on the internet that more replicas means faster searches, so they just add a replica per shard. Why doesn't this work? Because they already designed their cluster to be the minimum amount of resource that will support their indexing rate and total storage, and still basically respond to searches, so all of their nodes are pretty fully saturated with activity. Adding another copy of every shard just serves to bring their disks closer to being full, increased their heap usage (and garbage collection), increase the amount of indexing load on every node (every copy of the shard actually performs the same indexing that the primary node does), and provide another shard copy available to be searched. But if all three of the shards are on overburdened nodes, it doesn't really matter which node handles the request.
All of that said, though, back to your original question, having multiple data paths defined, to the extent that they aren't really going to contend for the same device, can be a way to reduce the disk read contention that can happen with a lot of concurrent searches on a node.
Please take a look at The Definitive Guide for a very useful overview of the search process. One conclusion you should be able to draw quite quickly is that the search of multiple shards on the same node doesn't behave differently depending on whether those shards are part of the same index.
I hope this helps.