We have data nodes in 3 different AWS AZs (availability zone), we have 3 separate coordinator nodes.
We noticed that data nodes in one of the AZ experience less load then the nodes in the other 2 AZs.
It turned out that one of the coordinator nodes was in the wrong AZ – it was in the availability zone in which cluster has no data nodes at all.
In other words, we had data nodes in A, B, C zones and had 3 coordinator nodes in A, B, D but not in C. This resulted in data nodes in zone "C" to receive less load: lower CPU usage, etc.
When we replaced a coordinator node in AZ D with a node in AZ C load became balanced.
Our coordinator nodes are behind an ELB and ELB metrics show all 3 coordinator nodes were receiving same amount of requests.
We have AWS zone awareness plugin enabled which makes sure no primary and replica of any shard are in the same AZ.
We have 40 nodes in the cluster and every index has exactly 40 shards (20 primary and 20 replicas) – hence are shard distribution across the nodes is perfectly even.
The ElasticSearch version is 6.8.
Is there anything that would make a coordinator node to "prefer" data nodes in the same zone and avoid routing requests to the data nodes in different AZs?
NOTE: We don't have an adaptive replica selection feature enabled.