I have been doing something similar lately. I have faced the same issues you are facing and found a solution to it by following the documentation on tuning for kNN performance.
A quick summary of what I am doing:
- Roughly 90.000.000 docs,
- Each docs has a Dense Vector of size 768,
- 5 nodes (32vCPU, 128GB each)
A couple of things that really made a difference:
- Make sure you have sufficient RAM (Analyze index disk usage API | Elasticsearch Guide [8.10] | Elastic). Also make sure you have some spare RAM for other processes. Also your system by default uses 50% for heap I believe. So my guess is that your 24GB per shard exceeds this threshold.
- Use forcemerge to reduce segments (I used 2 per shard). Merging segments really helped in query speeds! Also there are downsides to having few shards, so maybe finding some balance would be good here.
- Preloading the kNN index into memory (Preloading data into the file system cache | Elasticsearch Guide [8.9] | Elastic). This really helped a lot as well! But make sure when you do this the index can fit into your RAM.
- When you have sufficient RAM and have set up preloading the kNN index into memory, restart the cluster and rerun your experiment. Check IO on your cluster, when doing. When the kNN index does NOT fit into memory, you will see a high read IO.
There are a couple of challenges that lie ahead when you do these things, I noticed:
- (Heavy) indexing into this index will increase segments again making your queries slow again! I have not found a proper way to deal with this.
- Heavy operations on the index, like expensive queries or heavy indexing on the cluster (even in another index), may push out the kNN index from memory, making it terribly slow again! Yesterday I have started a thread on this: Manage heavy indexing in kNN indexes
Hope this helps a bit!