We have implemented vector similarity search using ES
dense_vector field and KNN option in the search API.
We are using 1024 dimension embeddings and our index size is about 60 Gb for approx 11_000_000 documents. So our
primary_store_size is 60.3 GB and
store_size is 179,3 GB
During testing we found out that searches are taking quite a lot if time, 20-30 seconds when we set
num_candidates to something like 400-500.
Our original thinking was that the whole index was not fitting into memory (the server has 128 Gb RAM), so we built a test server with 128 RAM and cloned the drive. So now both
store_size are 60.3 Gb and the index should definitely fit the RAM (there is nothing else running on this server). But we did not really see any improvements in the search time, sometimes it actually takes more time.
How can I troubleshoot this problem? Why did not we see any improvement in search speed even though the whole index fits into RAM (which by itself should be a major boost to search performance).
PS: I did read the article here and implemented some of the suggestions (dot_product, enough RAM, exclude vector fields from _source, avoid heavy indexing during searches, avoid page cache thrashing by using modest readahead values on Linux)