Adaptive Replica Selection and knn load balancing

peedeeboy · May 9, 2025, 1:55pm

hey friends

I posted previously about our efforts on optimising our dedicated knn cluster.

The latest thing we've been trying to understand / solve is why when we run load/stress testing, we often see 1 or 2 nodes serving the bulk of requests - and slowing down / queuing search requests due to 80 - 100% CPU usage - whilst other nodes are chilling at 20% CPU usage and less requests.

We disabled Adaptive Replica Selection, which seems to be having the desired affect, load is consistently distributed across all nodes, resulting in a happy cluster

What we don't understand is WHY we see this skewed load with the default Adaptive Replica Selection switched on?

We have a very simple setup. ~120k docs in an index consisting of just a keyword (id) and dense vector field, with one shard, one primary and other nodes replica.

We've never seen a problem with load distribution on our main traditional keyword cluster - seems to be an issue specifically with vector / approximate knn searching? And we're baffled!

Any ideas what might be going as we're all out of ideas!

Topic		Replies	Views
Question regarding ES distribution of incoming requests Elasticsearch	1	278	February 3, 2021
Uneven search requests distribution among nodes Elasticsearch	3	508	March 25, 2020
Adaptive replica selection in ES version 8 Elasticsearch	2	424	December 11, 2020
Unbalanced CPU load when enabling vector search Elasticsearch vector-search	1	393	November 28, 2023
Processing concentration on some cluster nodes - The return Elasticsearch	7	477	November 12, 2018

Adaptive Replica Selection and knn load balancing

Related topics