2.78 TB storage | 15 GB RAM | 1.9 vCPU, 2 zones
Is this profile PER node or the total resources over 2 zones?
This isn't the vector profile. I am not sure what profile that is. I am assuming dense storage as the disk is very high ratio compared to ram and vcpu.
Deploying my own single node that is 15GB | 1.9vCPU and calling _nodes I see "-Des.total_memory_bytes=16101933056",
Then for JVM allocation "heap_max_in_bytes": 8053063680
So, the single node has about 7.49GB of off heap.
Your data set since it has replicas requires about 15GB for vectors only. But you are also doing other types of queries. You are right on the edge of what is required only for vectors, but you don't have any leeway at all for other queries, which also require some memory to run (term postings, etc.).
I suggest:
- going up a level in node size (you don't need to go to the max level of 60)
- Increasing the quantization to
int4_hnsw
Your latency would also improve with more vCPUs.