Dense vector field space requirements

ruslaniv · October 11, 2022, 9:30am

Hi!
We are experimenting with dense vector field type for the purpose of similarity search.

So we have a test index with approx 5_000_000 documents, each document has about 30 fields and 25 of these fields are mapped to both keyword AND text.

So I created a new index, where 24 of these 25 fields were set only to keyword AND index set to False. And only one field mapped to dense_vector with 768 dimensions and index set to True.

After indexing 500_000 documents we have noticed that the index size is already at 10 Gb, so the full test set would be approx 100 Gb. Whereas the current index takes only approx 30Gb.

My assumption was that by eliminating so many redundant fields and removing them from the index the index size would be somewhat smaller but it looks like it is going to be quite larger instead.

According to this: Performance and storage of the dense_vector type each vector takes 3kB before compression, so the size of my new index really puzzles me.

Does dense vector field take so much space or am I doing something wrong?

ruslaniv · November 7, 2022, 5:37am

Well, I just saved a 768 vector in a plain text file and it takes about 17 Kb on disk.
So for 5_000_000 documents it will require about 85 Gb. But then I do not understand where this formula 4*dims+4 comes from because I'm definitely seeing different results.

ruslaniv · November 25, 2022, 6:49am

Just to add some empirical evidence after indexing nearly 5_500_000 documents with a dense_vector field of dimensionality of 1024 on 2 different servers (test and prod): the index size is approximately 22GB per 1_000_000 documents.

system · December 23, 2022, 6:50am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch dense_vector is taking up too much storage space！Help Elasticsearch vector-search	8	112	September 24, 2024
Performance and storage of the dense_vector type Elasticsearch	3	2522	April 22, 2021
Dense vectors taking up much more space than expected Elasticsearch vector-search	2	87	November 8, 2024
How to exclude dense_vector field from being stored Elasticsearch vector-search	7	1124	December 29, 2022
Slow aKNN search Elasticsearch vector-search	7	895	April 20, 2023

Dense vector field space requirements

Related topics