Managed Elastic search for billion scale dense vector index and performance

cvgoudar · October 11, 2022, 10:10am

We want to understand on the requirements and expected response time for vector based search with close to 1 Billion semantic vector index. The dense vector is 768 dimensional.

What would be the appropriate configuration of Managed Elastic search to get response with knn search within 100-200 ms response time. We would like to get close to Top 20 matches with vector similarity.

Also will there be performance impact if filter option is used

Julie_Tibshirani · November 7, 2022, 11:39pm

Hello @cvgoudar, unfortunately we don't have published benchmarks for this configuration (1 billion vectors with 768 dimensions). To determine if the performance will be acceptable, our recommendation is to test your dataset + queries using a benchmarking framework like Elasticsearch rally (https://esrally.readthedocs.io). You can start with a single node, fitting as many vectors as possible into it, and then calculate how many nodes you will need for the full 1 billion vector dataset. Here are some resources that can help:

The kNN search tuning guide: Tune approximate kNN search | Elasticsearch Guide [8.5] | Elastic. This guide explains that memory is a primary bottleneck for vector search -- you need to have enough RAM available on the node to hold all the vector data in page cache.
In an upcoming release, we'll add a support for lower-precision vector element types: https://github.com/elastic/elasticsearch/pull/90774. This is sometimes called "quantization" and can really help reduce memory requirements for large datasets like yours.

About filtering: yes, approximate kNN search is usually slower when using a 'filter'. This is because the search needs to skip over documents that do not match the filter. If a filter is very selective (meaning it matches few documents), the performance impact can be greater. If you are always using a filter and it's quite selective, then you should check whether exact kNN search (k-nearest neighbor (kNN) search | Elasticsearch Guide [8.5] | Elastic) is a better fit for your use case.

ruslaniv · November 25, 2022, 6:51am

I'd be VERY cautious and run extensive testing before indexing this large of a dataset. Check this discussion: Dense vector field space requirements

system · December 23, 2022, 6:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow speed of ANN dense vector search using _knn_search Elasticsearch	8	1940	July 22, 2022
Slow aKNN search Elasticsearch vector-search	7	910	April 20, 2023
Tune elasticsearch for searching speed Elasticsearch vector-search	4	47	November 14, 2024
KNN Search super slow Elasticsearch docker , vector-search	3	1152	January 17, 2023
KNN search speed Elasticsearch vector-search	12	1838	April 20, 2023

Managed Elastic search for billion scale dense vector index and performance

Related topics