Uneven load onto data nodes in different AWS Availability Zones

a06ced31bae02498a46d · November 21, 2019, 10:52am

We have data nodes in 3 different AWS AZs (availability zone), we have 3 separate coordinator nodes.
We noticed that data nodes in one of the AZ experience less load then the nodes in the other 2 AZs.
It turned out that one of the coordinator nodes was in the wrong AZ – it was in the availability zone in which cluster has no data nodes at all.

In other words, we had data nodes in A, B, C zones and had 3 coordinator nodes in A, B, D but not in C. This resulted in data nodes in zone "C" to receive less load: lower CPU usage, etc.

When we replaced a coordinator node in AZ D with a node in AZ C load became balanced.

Our coordinator nodes are behind an ELB and ELB metrics show all 3 coordinator nodes were receiving same amount of requests.

We have AWS zone awareness plugin enabled which makes sure no primary and replica of any shard are in the same AZ.
We have 40 nodes in the cluster and every index has exactly 40 shards (20 primary and 20 replicas) – hence are shard distribution across the nodes is perfectly even.
The ElasticSearch version is 6.8.

Is there anything that would make a coordinator node to "prefer" data nodes in the same zone and avoid routing requests to the data nodes in different AZs?

NOTE: We don't have an adaptive replica selection feature enabled.

DavidTurner · November 21, 2019, 12:02pm

Yes, allocation awareness does that:

Elasticsearch prefers using shards in the same location (with the same awareness attribute values) to process search or GET requests. Using local shards is usually faster than crossing rack or zone boundaries.

That's true of all versions released today, but it won't in 8.0.0 and in 7.5.0 there is a system property to disable this behaviour too.

system · December 19, 2019, 12:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
2 Availability Zones in cluster Elasticsearch	3	1580	November 6, 2018
Searches never distribute across nodes in different AWS Availability Zones Elasticsearch	1	25	December 6, 2024
Allocation awareness seems to prefer local shards even when preferred node is at 100% CPU Elasticsearch	3	749	December 27, 2018
Cluster rebalancing across AZs Elasticsearch	6	1117	January 28, 2020
Cloud routing awareness not working? Elasticsearch	2	2129	July 5, 2017

Uneven load onto data nodes in different AWS Availability Zones

Related topics