It's unexpected that a primary should differ from a replica in this way since they should both perform roughly the same I/O. Can you share some more quantitative details, including the numerical measurements you have made?
The main exception to this rule is for an update-heavy workload, in which the primary must compute the updates. Are you using updates heavily? Your other recent question suggests that you use daily indices, which normally means your workload involves a lot of inserts but not many updates, but could you clarify?
Elasticsearch does not balance primary shards because they're mostly equivalent to replicas. However it's also a bit surprising that all the primaries end up on one node in a healthy 6-node cluster, particularly if you are using 1-replica redundancy, because with equal numbers of primaries and replicas the cluster would be very unbalanced if half the shards (i.e. all the primaries) were on a single node. Could you give some more details about this as it might be some kind of misconfiguration?