ANN Search: Performance / Setup

I recently wrote this post to report some issues with the ANN Search / Set-Up. ANN Search Timeouts - #8 by Julie_Tibshirani

The main take-away for me was to use the:
"index.refresh_interval": "-1" setting and run a first request with source:false to get to an acceptable performance. Thanks again @mayya @Julie_Tibshirani

We added another index Index_d with more than 105 Mio documents with 768 vector dimensions. This index might grow to over 200 Mio documents.

So in total there are

  • Index a_cos: ~6 Mio
  • Index b_cos: ~10 Mio
  • Index c_cos: ~3 Mio
  • Index_d_cos: ~105 Mio

It currently takes 58 Minutes to conduct an ANN search.
That is way too long for our use-case.
My hypothesis is that the index does not fit in the RAM.

The setup is in a cloud environment where we currently have:

  • 1 Node
    • 8 VCPUs
    • 128GBs of VRAM
    • 4TB of SSD storage

So now my questions are:

  • What is a better set-up to come to an acceptable performance (req: ~1s)?
    • Cluster Size?
    • Node Size?
      • VCPUS
      • VRAM
      • SSD Storage
  • Are there additional tweaks regarding the performance?

Thank you so much.

Thanks for reporting your use case.

58 minutes seems to be super long time for ANN search. Are you sure that this search is not blocked on indexing? Are you running these searches when all indexing is done and index is refreshed?

For the fastest searches we recommend to have enough RAM for all vectors to fit in. For example, if you have 200M vectors of 768 dims and each dims being float takes 4 bytes, comfortable RAM size should be at least: 4 * 768 * 200M = 740 Gb. That's really a lot. Several ways to address it:

  • distributed vector search across several machines
  • reduce number of dims. 768 is a lot of dims, is there a way to reduce them?
  • quantize vector values to lower precision (e.g. 8 bits instead of 32 bits). This is still work in progress on Lucene side, and currently not supported in Elasticsearch, but we aspire to have it.
1 Like

Thanks @mayya for your answer.
Is there a blue print for optimal node set-up?

  • How large should the RAM be?
  • How many CPUs?

Thank you so much.

I also want to step in and say it would be very useful to have guidelines on the right node/cluster setup for efficient ANN search.

E.g. which are the most important resources, CPU, RAM, disk, number of nodes, which have the largest impact on ANN indexing and search.

2 Likes

Thanks for the feedback. We will be working on developing these guidelines.
For now, just consider that for fast vector search we suggest to at least have enough RAM to hold your vectors (4 Bytes * number of dims * number of docs). And this RAM is outside of Java heap.

1 Like

Thanks @mayya, for your reply.

I have one question regarding your statement:

Could you please elaborate a little bit more on what that means?

In my previous post, we detected that the search performs better if the java memory is reduced. The machine had 128GB, and we reduced it from the recommended half RAM 64GB via -Xms24g -Xmx24g to 24GB.
That configuration worked better.

Am I right in assuming that the HNSW implementation could then use more RAM and run faster?

I experimented with my setup to observe the behavior of the RAM with a reduced heap.
Using htop and I could not observe that additional RAM was used by HNSW.

If it's not using the JAVA Heap and I can not detect any changes in htop ... where is the structure stored?

Update: I observed that htop showed me a full RAM with yellow (except for the green Elasticsearch part). Yellow refers to disk cache. Am I right in assuming that this is the structure of HNSW which is store on disk, but is now cached in the RAM?

Thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.