I've been running some tests on Better Binary Quantization (BBQ) in Elasticsearch and comparing it with the default configuration for dense vectors, but I'm not observing the expected differences in disk size or search performance.
I embedded 100K comments, each with 1024 dimensions in each vector (IMS), and tested both configurations. However, the disk usage and search time appear to be almost identical, with no significant improvements from the BBQ configuration. In fact, sometimes the default configuration seems faster in terms of search time.
Index Sizes:
BBQ Index (my-index): 1.9GB
Default Index (my-index-2): 1.9GB
As shown, there is no difference in disk size between the two configurations.
Questions:
Index Size Comparison: How can I accurately measure the size of each index (BBQ vs Default) to check for differences in disk usage?
Performance Differences: Has anyone encountered similar results? What settings or tests can I adjust to identify any potential improvements with BBQ?
The disk footprint is dominated by the raw floating point vectors.
How did you determine your disk footprint? (which API, or looking directly at the directory, etc.)
For performance differences, its useful to know the queries utilized (the entire search request), the ES version, and the hardware on which its tested.
Also, how are you measuring search time? Is this the 'took' time in the request or measured client side?
I’ve been using the GET /_cat/indices/full-precision-index?v command to track disk usage, and I also rely on the GET /_stats/store API to get a closer look at the storage details.
I’ve been checking the took time on the client side like you mentioned
my Docker setup is showing the following stats for the elasticsearch container:
Why are the searching results showing full precision floating-point values for the vectors, even though the BBQ index configuration should use binary precision?
If we don't keep raw floating point values around _source, the values are still persist by underlaying Lucene, right? Would it be fine if we fetch values from Lucene for rescoring purpose?
For re-quantizing and segment merging, may I ask whether we indeed need to keep the raw floating point values for any type of quantization? And how the re-quantization works? would be appreciated if you could share any resources.
We keep the floating point values within Lucene. During the merge process, we will re-quantize given the new centroid of the new segment created during the merge. The only resources on this would be in the format. But to put it simply, we re-read the float32 vectors and quantize them again
May I ask whether the raw floating values are needed for all types of quantization? like int8, int4 and BBQ. Would it be fine for you to share some tutorials or resources that explain the re-quantization process? Really appreciate, thanks.
The context I am asking this question is that we have the issue of running into huge storage of those raw floating values, and we would like to check the option to keep only the quantized vector values in storage and delete the raw floating values. For re-ranking, we will use contextual based re-ranking then, which does not require the raw floating values.
May I ask about the following statement in [1], whether discard the raw floating point vectors meaning that they are not needed for re-quantization? Thanks a lot.
Furthermore, we anticipate that with 7 bit quantization we should be able to discard the raw floating point vectors and plan to evaluate this thoroughly.
That is the key part of the quote. It is not implemented yet but it is something we want to eventually implement. Or at least, quantization and compression for the floats at some lower bit size to utilize less disk space.
Hi @BenTrent , as always, thanks for your quick reply.
It is totally understandable w.r.t the schedule of the implementation and evaluation. From my question, I actually mainly want to clarify the reasoning behind the statement
with 7 bit quantization we should be able to discard the raw floating point vectors
may I ask whether you could elaborate it a bit more? Thanks a lot.
The main reasoning is that the quantization to that bit size may provide such good accuracy, that the original vectors won't be needed for typical search or merging operations.
@BenTrent thanks, does it mean that for re-quantization it is good enough to use the already quantized 7 bit or it is even not needed to do re-quantization?
@yli it would mean that for "re-quantization" we might still have to adjust the centroid, but it might be possible simply rehydrate (de-quantize), the int7 vectors, then re-quantize given the new centroid.
Centroids might actually shift over the life time of the data, especially for vectors that are near each-other based on their index order (e.g. images from a video).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.