Manage heavy indexing in kNN indexes

Thijsvdp · October 11, 2023, 6:15pm

Hi everyone, I have performance issues with vector search. Can anyone please help!
I have the following setup:

5 node running on K8s
Each node has 32vCPU, 128GB RAM
Total index size ~6.5TB at the moment (roughly 90.000.000 documents)
Each document includes a dense vector field for performing vector search. The dimension of the field is 768!

After reading the documentation for performance optimisation for kNN search (Tune approximate kNN search | Elasticsearch Guide [8.10] | Elastic), I did the following:

Make sure the cluster has sufficient memory to hold the HNSW graph in RAM.
Use the dot product.
Reduce the number of segments to 2 (per shard).
Warm up the filesystem cache.

This resulted in desired response times for our use case!

However…

When doing heavy inserts into the index the segments increase, and performance drops.
As a potential solution to this I decided to index all data into a new index, decrease segments, reassign alias, and drop the old index. This to avoid performance degradation during indexing.
It seems, however, that the heavy indexing pushes the HNSW graph out of the RAM, making searches slow again… I can see insanely high disk IO when doing kNN searches during these inserts.

Some potential solutions:

Increase nodes to give more RAM in the hope that the HNSW graph is not being pushed out of RAM.
Research the possibility to have dedicated nodes that take care of indexing such that the HNSW graph is not pushed out. Not even sure whether this is possible? Anyone ideas?
Hopefully someone else has another idea?

Thanks in advance!

Christian_Dahlqvist · October 11, 2023, 6:25pm

If you can build indices in parallel to the ones being searched you can use shard allocation filtering together with node attributes to ensure indices are built on separate nodes so they do not affect searching. Once they are built you can then move them or switch searches over to the new ones.

Thijsvdp · October 12, 2023, 7:34am

Thanks @Christian_Dahlqvist! I think that would solve the issue.

system · November 9, 2023, 7:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Indexing performance on indices with vector fields Elasticsearch vector-search	1	187	August 2, 2024
Slow speed of ANN dense vector search using _knn_search Elasticsearch	8	2050	July 22, 2022
Approximate KNN, Preloading & Performance Elasticsearch vector-search	5	209	February 28, 2024
Slow aKNN search Elasticsearch vector-search	7	936	April 20, 2023
Dense vector disk size Elasticsearch vector-search	1	54	August 28, 2024

Manage heavy indexing in kNN indexes

Related topics