hey friends
I posted previously about our efforts on optimising our dedicated knn cluster.
The latest thing we've been trying to understand / solve is why when we run load/stress testing, we often see 1 or 2 nodes serving the bulk of requests - and slowing down / queuing search requests due to 80 - 100% CPU usage - whilst other nodes are chilling at 20% CPU usage and less requests.
We disabled Adaptive Replica Selection, which seems to be having the desired affect, load is consistently distributed across all nodes, resulting in a happy cluster
What we don't understand is WHY we see this skewed load with the default Adaptive Replica Selection switched on?
We have a very simple setup. ~120k docs in an index consisting of just a keyword (id) and dense vector field, with one shard, one primary and other nodes replica.
We've never seen a problem with load distribution on our main traditional keyword cluster - seems to be an issue specifically with vector / approximate knn searching? And we're baffled!
Any ideas what might be going as we're all out of ideas!