Hi!
We are experimenting with dense vector field type for the purpose of similarity search.
So we have a test index with approx 5_000_000 documents, each document has about 30 fields and 25 of these fields are mapped to both keyword
AND text
.
So I created a new index, where 24 of these 25 fields were set only to keyword
AND index
set to False
. And only one field mapped to dense_vector
with 768 dimensions and index
set to True
.
After indexing 500_000 documents we have noticed that the index size is already at 10 Gb, so the full test set would be approx 100 Gb. Whereas the current index takes only approx 30Gb.
My assumption was that by eliminating so many redundant fields and removing them from the index the index size would be somewhat smaller but it looks like it is going to be quite larger instead.
According to this: Performance and storage of the dense_vector type each vector takes 3kB before compression, so the size of my new index really puzzles me.
Does dense vector field take so much space or am I doing something wrong?