We have implemented vector similarity search using ES dense_vector
field and KNN option in the search API.
We are using 1024 dimension embeddings and our index size is about 60 Gb for approx 11_000_000 documents. So our primary_store_size
is 60.3 GB and store_size
is 179,3 GB
During testing we found out that searches are taking quite a lot if time, 20-30 seconds when we set num_candidates
to something like 400-500.
Our original thinking was that the whole index was not fitting into memory (the server has 128 Gb RAM), so we built a test server with 128 RAM and cloned the drive. So now both primary_store_size
and store_size
are 60.3 Gb and the index should definitely fit the RAM (there is nothing else running on this server). But we did not really see any improvements in the search time, sometimes it actually takes more time.
How can I troubleshoot this problem? Why did not we see any improvement in search speed even though the whole index fits into RAM (which by itself should be a major boost to search performance).
PS: I did read the article here and implemented some of the suggestions (dot_product, enough RAM, exclude vector fields from _source, avoid heavy indexing during searches, avoid page cache thrashing by using modest readahead values on Linux)