We are planning to use Elasticsearch for vector search in our application, and we have a need to check the feasibility to remove the storage of original floating point vectors from Lucene which Elasticsearch uses as well internally, mainly due to that the storage size is huge to us.
I had a read of a few blog posts or tutorials on Elastic search labs, and basically understand that there are mainly two scalar quantization methods/approaches in Lucene, represent maybe by [1,2] and [3] separately. To make it easy to distinguish, I will use OSQ to mean the quantization method in [3], and Non-OSQ to mean the one in [1,2].
My understanding is that both OSQ and Non-OSQ currently requires the storage of original floating point vectors.
The OSQ needs to compute the centroid $m$ based on the original floating point vectors.
Is the $m$ also computed segment wise? If yes, meaning new centroid needs to be re-computed during segment merging?
The Non-OSQ might need to re-compute the quantization interval during segment merging and maybe even re-quantization, which requires original floating point vectors.
From the reading of [3], it is inspiring that you mentioned
We plan to roll out 2, 4 and 7 bit quantization schemes and indeed unify all our index compression options to use this same underlying technique. Furthermore, we anticipate that with 7 bit quantization we should be able to discard the raw floating point vectors and plan to evaluate this thoroughly.
This seems to be what we need in our case, and we understand that you might have a priority set for the above unify and evaluation. May I ask whether you might already have any rough schedule about when would you plan to do both of them?
okay, does it mean that Elasticsearch/Lucene only do a re-quantization based on raw floating point vectors for the merging segments whose original $m$ too much drifted with the newly computed $m$ based on all merging segments?
Ben was referring to the fact that your suggestion to detect whether m has drifted too far is not an optimization we have built yet but it is one we have considered already.
During merge we do re-calculate the centroid that is needed for quantization and write those new vectors out. And to your earlier point about optimization right now we do this every time. But want to optimize this further in the future.
Feel free to ask additional questions. Happy to talk this through with you more. Sorry I missed this thread earlier. It's a fun topic for me.
just to rephrase your info, so during the segments merge, the new centroid will be calculated via weighted sum from centroids from each segments, then the re-quantization will always be done in the last step(s) of segment merging based on the new centroid and the stored original floating point vectors, right?
during the segments merge, the new centroid will be calculated via weighted sum from centroids from each segments, then the re-quantization will always be done in the last step(s) of segment merging based on the new centroid and the stored original floating point vectors, right?
correct. you got it!
There's a few edge cases where we will rebuild the new centroid from the the original floating point vectors such as when there are deleted docs. But typically what you described is exactly what happens.
the set of all possible quantized approximation of the original floating point vectors
I think a quantized vector is an approximation of the original floating point vector. So to me this reads ok. It's basically a short hand and the two statements I believe say the same thing.
Feel free to dig in here if you think I'm missing some subtly.
Hi @john-wagster, sorry that my rephrase is not really very clear as well, by quantized vectors, I think this meaning the vectors which already only contain the values after the quantization. Actually, by quantized approximation of the original floating point vectors, I meant the float value vectors shown in the following formular.
The quantized vectors contain values from 0 to 2^n, depends on the number of bits n for quantization. But the value range [a, b] could be at a different scale than [0, 2^n], and is related to the value shown in the above screenshot. I hope this explains my thought a bit more clearer.
Ah I think I better understand your question. So the passage that references quantized vectors is referring to those which are quantized but then without any corrections applied.
The prior passage that references quantized approximation of the original floating point vectors is a reference to the quantized vectors after corrections have been applied. As the passage mentioned we never directly work with that quantity. Instead we compute the estimated distances between the quantized vectors and then apply corrections to recover the approximate distances between the original float vectors.
I still believe the article is correct though in how it distinguishes those to answer your original question.
Does that help allay any confusion there?
My apologies, it's been a bit since I've read through the article. And had just forgotten about that prior passage.
Hi @john-wagster, no problem, thanks for your clarification. Are the correction terms more used during dot product calculation? The geometry example, to me seems more about describing the geometry properties of the vectors.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.