The main take-away for me was to use the: "index.refresh_interval": "-1" setting and run a first request with source:false to get to an acceptable performance. Thanks again @mayya@Julie_Tibshirani
We added another index Index_d with more than 105 Mio documents with 768 vector dimensions. This index might grow to over 200 Mio documents.
So in total there are
Index a_cos: ~6 Mio
Index b_cos: ~10 Mio
Index c_cos: ~3 Mio
Index_d_cos: ~105 Mio
It currently takes 58 Minutes to conduct an ANN search.
That is way too long for our use-case.
My hypothesis is that the index does not fit in the RAM.
The setup is in a cloud environment where we currently have:
1 Node
8 VCPUs
128GBs of VRAM
4TB of SSD storage
So now my questions are:
What is a better set-up to come to an acceptable performance (req: ~1s)?
Cluster Size?
Node Size?
VCPUS
VRAM
SSD Storage
Are there additional tweaks regarding the performance?
58 minutes seems to be super long time for ANN search. Are you sure that this search is not blocked on indexing? Are you running these searches when all indexing is done and index is refreshed?
For the fastest searches we recommend to have enough RAM for all vectors to fit in. For example, if you have 200M vectors of 768 dims and each dims being float takes 4 bytes, comfortable RAM size should be at least: 4 * 768 * 200M = 740 Gb. That's really a lot. Several ways to address it:
distributed vector search across several machines
reduce number of dims. 768 is a lot of dims, is there a way to reduce them?
quantize vector values to lower precision (e.g. 8 bits instead of 32 bits). This is still work in progress on Lucene side, and currently not supported in Elasticsearch, but we aspire to have it.
Thanks for the feedback. We will be working on developing these guidelines.
For now, just consider that for fast vector search we suggest to at least have enough RAM to hold your vectors (4 Bytes * number of dims * number of docs). And this RAM is outside of Java heap.
Could you please elaborate a little bit more on what that means?
In my previous post, we detected that the search performs better if the java memory is reduced. The machine had 128GB, and we reduced it from the recommended half RAM 64GB via -Xms24g -Xmx24g to 24GB.
That configuration worked better.
Am I right in assuming that the HNSW implementation could then use more RAM and run faster?
I experimented with my setup to observe the behavior of the RAM with a reduced heap.
Using htop and I could not observe that additional RAM was used by HNSW.
If it's not using the JAVA Heap and I can not detect any changes in htop ... where is the structure stored?
Update: I observed that htop showed me a full RAM with yellow (except for the green Elasticsearch part). Yellow refers to disk cache. Am I right in assuming that this is the structure of HNSW which is store on disk, but is now cached in the RAM?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.