Searches never distribute across nodes in different AWS Availability Zones

hatertot · December 4, 2024, 11:00pm

I have a cluster of about 19 nodes on ES 7.17.1. They are all definitely part of the same cluster. Each node has an entire copy of every index. 11 are in us-east-1f, 6 are in us-east-1d, 1 is in 1c and 1 is in 1b. I'd been trying to figure out why requests that routed to 1f were so much faster than those routed to any other zone when I discovered, using profile=true, that every query will only ever use shards in the same availability zone. I've tried and verified countless requests. They do seem to do a good job of distributing load to shards within the same zone.

GET /my_index/_search_shards shows every shard in every availability zone and each one has "state": "STARTED". Cluster status is green. All nodes have 7 search threads and 0 queue (from /_nodes/stats/thread_pool).

I've disabled adaptive _replica_selection,

GET /_cluster/settings
--> response:
{
    "persistent": {
        "cluster": {
            "routing": {
                "use_adaptive_replica_selection": "false"
            }
        }
    },
    "transient": {}
}

still having the same issue. The only guess I have at this point is that disabling adaptive replica selection has had no effect or might need a rolling restart, which I'm hesitant to perform on our production cluster unless someone thinks I need to in order to disable it.

I was hoping someone might have any helpful insight for me here.
Thanks much for reading.

hatertot · December 6, 2024, 11:01pm

I found, hidden in a tip in the docs, that even with adaptive replica selection disabled, if a node has shard allocation awareness attributes, then it will prefer shards with the same awareness attribute values.

My cluster has a default awareness attribute, "aws_availability_zone", I wonder if it was set because I'm using ec2 discovery. Anyway I followed the advice and set the JVM option es.search.ignore_awareness_attributes=true, and now my searches beautifully distribute across the entire cluster, which was really freaking important because my use case necessitated large _msearch'es.

happy problem solving fellow problem solvers

Topic		Replies	Views
Cloud routing awareness not working? Elasticsearch	2	2128	July 5, 2017
ElasticSearch Zone Awareness in AWS Elasticsearch	1	619	March 16, 2017
Shard and Zone awareness Elasticsearch	9	1567	March 22, 2018
Cluster's node distributed in two subnets Elasticsearch	3	1362	November 17, 2019
Uneven load onto data nodes in different AWS Availability Zones Elasticsearch	2	698	December 19, 2019

Searches never distribute across nodes in different AWS Availability Zones

Related topics