Vector search for large amount of data with limited RAM resources

louis_sg · July 5, 2023, 2:04pm

Hi there,
I am new here trying to use Elasticsearch for vector similarity search.
My dataset is as large as 100M records, each containing a 384 dimensions vector and a string payload.
I am building the HNSW index type, but it will raise OOM errors even when I use a small portion (10M) of my data in a 25Gb docker container. Considering my machine's total RAM available, it might be hard to increase RAM to fit all 100M records.
Any suggestion on uploading all data into the database with limited RAM resources?

Alex_Salgado-Elastic · July 5, 2023, 2:24pm

Hi @louis_sg , have you tried this solution from stackoverflow ?indexing - Uploading large 800gb json file from remote server to elasticsearch - Stack Overflow

Split your data into smaller chunks and send them to Elasticsearch using multiple Bulk Requests.

louis_sg · July 5, 2023, 3:10pm

Yes, I am already using this method to upload records in batches of 64.

Alex_Salgado-Elastic · July 5, 2023, 4:03pm

Is it possible to show your mappings about the field that you are using to create the embeddings? and the max size of it?

system · August 2, 2023, 4:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Hardware requirements for high performant search on a 200M vector dataset Elasticsearch elastic-stack-machine-learning	2	168	April 26, 2024
ANN Search is super slow Elasticsearch vector-search	15	1892	November 22, 2023
KNN Search super slow Elasticsearch docker , vector-search	3	1157	January 17, 2023
Hardware recommendation for vector search Elasticsearch docker , vector-search	7	41	November 25, 2024
Slow aKNN search Elasticsearch vector-search	7	912	April 20, 2023

Vector search for large amount of data with limited RAM resources

Related topics