The main take-away for me was to use the: "index.refresh_interval": "-1" setting and run a first request with source:false to get to an acceptable performance. Thanks again @mayya@Julie_Tibshirani
We added another index Index_d with more than 105 Mio documents with 768 vector dimensions. This index might grow to over 200 Mio documents.
So in total there are
Index a_cos: ~6 Mio
Index b_cos: ~10 Mio
Index c_cos: ~3 Mio
Index_d_cos: ~105 Mio
It currently takes 58 Minutes to conduct an ANN search.
That is way too long for our use-case.
My hypothesis is that the index does not fit in the RAM.
The setup is in a cloud environment where we currently have:
1 Node
8 VCPUs
128GBs of VRAM
4TB of SSD storage
So now my questions are:
What is a better set-up to come to an acceptable performance (req: ~1s)?
Cluster Size?
Node Size?
VCPUS
VRAM
SSD Storage
Are there additional tweaks regarding the performance?
58 minutes seems to be super long time for ANN search. Are you sure that this search is not blocked on indexing? Are you running these searches when all indexing is done and index is refreshed?
For the fastest searches we recommend to have enough RAM for all vectors to fit in. For example, if you have 200M vectors of 768 dims and each dims being float takes 4 bytes, comfortable RAM size should be at least: 4 * 768 * 200M = 740 Gb. That's really a lot. Several ways to address it:
distributed vector search across several machines
reduce number of dims. 768 is a lot of dims, is there a way to reduce them?
quantize vector values to lower precision (e.g. 8 bits instead of 32 bits). This is still work in progress on Lucene side, and currently not supported in Elasticsearch, but we aspire to have it.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.