Hi all,
Have been experimenting with applying both forms of compression to our dense vectors and doing performance comparisons, but while bbq_hnsw has been performing relatively well at 1-3s per query average, int8 has been extremely slow with roughly 8-10s+ per knn query.
I dug into the disk usage of both the vector fields (which both contain the 10M vectorized images using 512-dimensions) thinking maybe the fields are bigger than I think and I didn’t give enough memory to the pod hosting it to have all the fields in memory, and while for 10M 512 dimensioned assets the BBQ compression vector looks about right in size with a rescore of 3 built-in, the int8 is just off the chart. That seems like the combined total of uncompressed vectors and compressed vectors.
"image_vector_bbq": {
"total": "1.3gb",
"total_in_bytes": 1443088252,
"knn_vectors": "1.3gb",
"knn_vectors_in_bytes": 1443088252
},
"image_vector_int8": {
"total": "26.4gb",
"total_in_bytes": 28400884275,
"knn_vectors": "26.4gb",
"knn_vectors_in_bytes": 28400884275
},
Is this normal behavior? I couldn’t find any way to further break down this number via the documentation, off the top of my head since it is only in one and not the other, I’m leaning towards not normal. Just looking to see if maybe the reason its so slow is that its for whatever reason trying to load 26.4gb into memory when only 10GB is allocated to the application with a 5GB heap size. I do realize the heap size may need to inch up a bit as theoretically both compressed vectors combined sizes using the estimations are likely in the 6-7 GB range. Regardless, my understanding was that the non-compressed vectors are supposed to be stored on disk and inflate the _source size but not be stored within the int8 compressed vector field.
For extra context, our kNN search is using a k of 30, a num_candidates of 200, and a rescore of 20. It’s definitely bigger than the k=8-10 I’ve seen around, so thats also a possible cause of the slowdown, but I’m trying to get rid of any core architecture possibilities before modifying the query.