Confusion on cluster operation and sizing

Hi everyone

I am quite new to elastic search and i have been reading the documentation for a while but i have a few questions and confused is some parts. My environment is as follows:

I have a cluster of three (3) servers, es1,es2 and es3. The first server es1, is the primary where logstash and kibana is installed, all remaining servers just have elasticsearch installed and are just members of the es cluster. So far i am using 5 primary shards and one replica.

All traffic goes through primary server es1 with no other client or application directly accessing the other two "slave servers".

My thoughts-questions are:

According to documentation, when a client requests for particular data on es1 and the information is located, lets say on es2, then the primary server (es1 in our case) will forward the request to es2 and once received the answer (data) will in turn deliver back to the client.

So in my case, where all traffic goes to es1, wouldn’t that actually decrease performance? The way i think of it is that, if all data was located on es1, in other words locally, then the server will have respond faster instead of having to refer-request the data from another server. Is that correct? Am i thinking wrong of this?

Having 3 nodes with five primary shards and one replica means that some primary shards will be allocated to slave servers. If i set the number of replicas to two (2) then this will actually result all three nodes to have the exact same data. All primary shards to es1 and one replica on each secondary server. But in such case, then all request will be served from es1 since all data are located locally. Is that correct?

Thanks in advance for any help/