It is known in the community that this uneven distribution of primary nodes happen, but it seems the communication seems to be it does not matter.
This is a problem for the following reasons:
-
Indexing must happen on the primary shard first before being copied to the replicate.
-
Bulk Updates ---- all the request to index to the primary shard is being bottled neck at one machine and thus my cluster ends up with rejected request. The number below is the number for rejected request for the machine picture above.
The machine ultimately crumbles to the ground as pinged with update requests.
CLUSTER03 bulk 0 0 67994