I have a cluster of about 19 nodes on ES 7.17.1. They are all definitely part of the same cluster. Each node has an entire copy of every index. 11 are in us-east-1f, 6 are in us-east-1d, 1 is in 1c and 1 is in 1b. I'd been trying to figure out why requests that routed to 1f were so much faster than those routed to any other zone when I discovered, using profile=true, that every query will only ever use shards in the same availability zone. I've tried and verified countless requests. They do seem to do a good job of distributing load to shards within the same zone.
GET /my_index/_search_shards shows every shard in every availability zone and each one has "state": "STARTED". Cluster status is green. All nodes have 7 search threads and 0 queue (from /_nodes/stats/thread_pool).
I've disabled adaptive _replica_selection,
GET /_cluster/settings
--> response:
{
"persistent": {
"cluster": {
"routing": {
"use_adaptive_replica_selection": "false"
}
}
},
"transient": {}
}
still having the same issue. The only guess I have at this point is that disabling adaptive replica selection has had no effect or might need a rolling restart, which I'm hesitant to perform on our production cluster unless someone thinks I need to in order to disable it.
I was hoping someone might have any helpful insight for me here.
Thanks much for reading.