Question about Optimized Scalar Quantization

yli · July 9, 2025, 9:32pm

Hi, Good day,

We are planning to use Elasticsearch for vector search in our application, and we have a need to check the feasibility to remove the storage of original floating point vectors from Lucene which Elasticsearch uses as well internally, mainly due to that the storage size is huge to us.

I had a read of a few blog posts or tutorials on Elastic search labs, and basically understand that there are mainly two scalar quantization methods/approaches in Lucene, represent maybe by [1,2] and [3] separately. To make it easy to distinguish, I will use OSQ to mean the quantization method in [3], and Non-OSQ to mean the one in [1,2].

My understanding is that both OSQ and Non-OSQ currently requires the storage of original floating point vectors.

The OSQ needs to compute the centroid $m$ based on the original floating point vectors.
- Is the $m$ also computed segment wise? If yes, meaning new centroid needs to be re-computed during segment merging?
The Non-OSQ might need to re-compute the quantization interval during segment merging and maybe even re-quantization, which requires original floating point vectors.

From the reading of [3], it is inspiring that you mentioned

We plan to roll out 2, 4 and 7 bit quantization schemes and indeed unify all our index compression options to use this same underlying technique. Furthermore, we anticipate that with 7 bit quantization we should be able to discard the raw floating point vectors and plan to evaluate this thoroughly.

This seems to be what we need in our case, and we understand that you might have a priority set for the above unify and evaluation. May I ask whether you might already have any rough schedule about when would you plan to do both of them?

Thanks a lot and look forward to your reply.

[1] Understanding scalar quantization in Lucene - Elasticsearch Labs
[2] Scalar quantization optimized for vector databases - Elasticsearch Labs
[3] Understanding optimized scalar quantization - Elasticsearch Labs

yli · July 9, 2025, 9:33pm

@BenTrent I continue our previous communication here to have a formal topic defined.

BenTrent · July 10, 2025, 12:36pm

We have no schedule on when this will be implemented. It has been deprioritized due to other work currently.

yli · July 11, 2025, 6:31am

@BenTrent thanks for the information w.r.t schedule.

May I ask the correctness of the questions about $m$ in the description? Thanks.

BenTrent · July 11, 2025, 2:23pm

Is the $m$ also computed segment wise? If yes, meaning new centroid needs to be re-computed during segment merging?

Correct, its per segment and calculated on merge. But on merge, we can simply sum the already calculated segment centroids and do a weighted average.

yli · July 11, 2025, 2:44pm

@BenTrent thanks for the info.

okay, does it mean that Elasticsearch/Lucene only do a re-quantization based on raw floating point vectors for the merging segments whose original $m$ too much drifted with the newly computed $m$ based on all merging segments?

yli · July 15, 2025, 9:45pm

@BenTrent may I get your answer for the above question? Thanks a lot.

BenTrent · July 16, 2025, 1:58pm

This is not how BBQ is done today, but this is definitely an optimization we want to apply eventually.

This is how we handle it for our older int4 and int8 quantization indices.

yli · July 16, 2025, 9:31pm

Hi @BenTrent, thanks for your reply.

This is not how BBQ is done today, but this is definitely an optimization we want to apply eventually.

do you mean that the current BBQ using OSQ does not do re-quantization on segments merging?

yli · July 21, 2025, 9:12pm

@BenTrent may I get your input here? thanks and sorry for frequent questions.

john-wagster · July 22, 2025, 1:44pm

Ben was referring to the fact that your suggestion to detect whether m has drifted too far is not an optimization we have built yet but it is one we have considered already.

During merge we do re-calculate the centroid that is needed for quantization and write those new vectors out. And to your earlier point about optimization right now we do this every time. But want to optimize this further in the future.

Feel free to ask additional questions. Happy to talk this through with you more. Sorry I missed this thread earlier. It's a fun topic for me.

yli · July 22, 2025, 9:34pm

Hi @john-wagster, thanks for your answers.

just to rephrase your info, so during the segments merge, the new centroid will be calculated via weighted sum from centroids from each segments, then the re-quantization will always be done in the last step(s) of segment merging based on the new centroid and the stored original floating point vectors, right?

john-wagster · July 23, 2025, 12:35am

during the segments merge, the new centroid will be calculated via weighted sum from centroids from each segments, then the re-quantization will always be done in the last step(s) of segment merging based on the new centroid and the stored original floating point vectors, right?

correct. you got it!

There's a few edge cases where we will rebuild the new centroid from the the original floating point vectors such as when there are deleted docs. But typically what you described is exactly what happens.

You can review it in the code yourself here:

github.com/elastic/elasticsearch

server/src/main/java/org/elasticsearch/index/codec/vectors/es818/ES818BinaryQuantizedVectorsWriter.java

79bdc3276


      
                  // write end of fields marker
                  meta.writeInt(-1);
                  CodecUtil.writeFooter(meta);
              }
              if (binarizedVectorData != null) {
                  CodecUtil.writeFooter(binarizedVectorData);
              }
          }
          
          @Override
          public void mergeOneField(FieldInfo fieldInfo, MergeState mergeState) throws IOException {
              if (fieldInfo.getVectorEncoding().equals(VectorEncoding.FLOAT32)) {
                  final float[] centroid;
                  final float[] mergedCentroid = new float[fieldInfo.getVectorDimension()];
                  int vectorCount = mergeAndRecalculateCentroids(mergeState, fieldInfo, mergedCentroid);
                  // Don't need access to the random vectors, we can just use the merged
                  rawVectorDelegate.mergeOneField(fieldInfo, mergeState);
                  centroid = mergedCentroid;
                  if (segmentWriteState.infoStream.isEnabled(BINARIZED_VECTOR_COMPONENT)) {
                      segmentWriteState.infoStream.message(BINARIZED_VECTOR_COMPONENT, "Vectors' count:" + vectorCount);
                  }

yli · July 23, 2025, 12:46pm

@john-wagster thanks a lot for the confirmation and providing more detail information.

john-wagster · July 23, 2025, 3:21pm

No problem let me know if you have other questions!

yli · July 23, 2025, 9:14pm

@john-wagster thanks, I do would like to clarify one point in [1], about the following highlighted part

Instead of

the set of all possible quantized vectors

Shouldn't it be

the set of all possible quantized approximation of the original floating point vectors

?

Thanks.

[1] Understanding optimized scalar quantization - Elasticsearch Labs

john-wagster · July 23, 2025, 9:27pm

the set of all possible quantized vectors

the set of all possible quantized approximation of the original floating point vectors

I think a quantized vector is an approximation of the original floating point vector. So to me this reads ok. It's basically a short hand and the two statements I believe say the same thing.

Feel free to dig in here if you think I'm missing some subtly.

yli · July 23, 2025, 10:00pm

Hi @john-wagster, sorry that my rephrase is not really very clear as well, by quantized vectors, I think this meaning the vectors which already only contain the values after the quantization. Actually, by quantized approximation of the original floating point vectors, I meant the float value vectors shown in the following formular.

The quantized vectors contain values from 0 to 2^n, depends on the number of bits n for quantization. But the value range [a, b] could be at a different scale than [0, 2^n], and is related to the value shown in the above screenshot. I hope this explains my thought a bit more clearer.

john-wagster · July 24, 2025, 2:41am

Ah I think I better understand your question. So the passage that references quantized vectors is referring to those which are quantized but then without any corrections applied.

The prior passage that references quantized approximation of the original floating point vectors is a reference to the quantized vectors after corrections have been applied. As the passage mentioned we never directly work with that quantity. Instead we compute the estimated distances between the quantized vectors and then apply corrections to recover the approximate distances between the original float vectors.

I still believe the article is correct though in how it distinguishes those to answer your original question.

Does that help allay any confusion there?

My apologies, it's been a bit since I've read through the article. And had just forgotten about that prior passage.

yli · July 24, 2025, 2:14pm

Hi @john-wagster, no problem, thanks for your clarification. Are the correction terms more used during dot product calculation? The geometry example, to me seems more about describing the geometry properties of the vectors.

Topic		Replies	Views
Optimizing to compress, already at 1 segment Elasticsearch	1	387	July 6, 2017
Optimize in Elastic search Elasticsearch	4	335	July 6, 2017
About the dense vector compression Elasticsearch vector-search	2	1304	December 20, 2022
Elasticsearch & timeseries Elasticsearch	7	473	July 6, 2017
Vector Scoring Elasticsearch	6	6200	June 18, 2017

Question about Optimized Scalar Quantization

Related topics