Dense vector field space requirements

Hi!
We are experimenting with dense vector field type for the purpose of similarity search.

So we have a test index with approx 5_000_000 documents, each document has about 30 fields and 25 of these fields are mapped to both keyword AND text.

So I created a new index, where 24 of these 25 fields were set only to keyword AND index set to False. And only one field mapped to dense_vector with 768 dimensions and index set to True.

After indexing 500_000 documents we have noticed that the index size is already at 10 Gb, so the full test set would be approx 100 Gb. Whereas the current index takes only approx 30Gb.

My assumption was that by eliminating so many redundant fields and removing them from the index the index size would be somewhat smaller but it looks like it is going to be quite larger instead.

According to this: Performance and storage of the dense_vector type each vector takes 3kB before compression, so the size of my new index really puzzles me.

Does dense vector field take so much space or am I doing something wrong?

Well, I just saved a 768 vector in a plain text file and it takes about 17 Kb on disk.
So for 5_000_000 documents it will require about 85 Gb. But then I do not understand where this formula 4*dims+4 comes from because I'm definitely seeing different results.

Just to add some empirical evidence after indexing nearly 5_500_000 documents with a dense_vector field of dimensionality of 1024 on 2 different servers (test and prod): the index size is approximately 22GB per 1_000_000 documents.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.