No Observable Difference Between BBQ and Default Configurations in Elasticsearch – Help with Index Size Comparison

mohab_ghobashy · May 5, 2025, 2:04pm

I've been running some tests on Better Binary Quantization (BBQ) in Elasticsearch and comparing it with the default configuration for dense vectors, but I'm not observing the expected differences in disk size or search performance.

Test Setup:

BBQ Index Configuration (`my-index`):

{
  "mappings": {
    "properties": {
      "vector": {
        "type": "dense_vector",
        "dims": 1024,
        "index_options": {
          "type": "bbq_hnsw",
          "m": 16,
          "ef_construction": 100
        }
      }
    }
  },
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "1"
    }
  }
}

Default Index Configuration (`my-index-2`):

{
  "mappings": {
    "properties": {
      "vector": {
        "type": "dense_vector",
        "dims": 1024,
        "index_options": {
          "type": "int8_hnsw",
          "m": 16,
          "ef_construction": 100
        }
      }
    }
  },
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "1"
    }
  }
}

Problem:

I embedded 100K comments, each with 1024 dimensions in each vector (IMS), and tested both configurations. However, the disk usage and search time appear to be almost identical, with no significant improvements from the BBQ configuration. In fact, sometimes the default configuration seems faster in terms of search time.

Index Sizes:

BBQ Index (my-index): 1.9GB
Default Index (my-index-2): 1.9GB

As shown, there is no difference in disk size between the two configurations.

Questions:

Index Size Comparison: How can I accurately measure the size of each index (BBQ vs Default) to check for differences in disk usage?
Performance Differences: Has anyone encountered similar results? What settings or tests can I adjust to identify any potential improvements with BBQ?

Alex_Salgado-Elastic · May 5, 2025, 3:06pm

Hi @mohab_ghobashy , welcome to our community.

Have you read this articles?

Better Binary Quantization (BBQ) vs. Product Quantization - Elasticsearch Labs
How to implement Better Binary Quantization (BBQ) into your use case and why you should - Elasticsearch Labs
Better Binary Quantization (BBQ) in Lucene and Elasticsearch - Elasticsearch Labs

BenTrent · May 6, 2025, 1:38pm

Hey @mohab_ghobashy

The disk footprint is dominated by the raw floating point vectors.

How did you determine your disk footprint? (which API, or looking directly at the directory, etc.)

For performance differences, its useful to know the queries utilized (the entire search request), the ES version, and the hardware on which its tested.

Also, how are you measuring search time? Is this the 'took' time in the request or measured client side?

mohab_ghobashy · May 7, 2025, 9:08am

Hey @BenTrent,

Thanks for your thoughts!

I’ve been using the GET /_cat/indices/full-precision-index?v command to track disk usage, and I also rely on the GET /_stats/store API to get a closer look at the storage details.

I’ve been checking the took time on the client side like you mentioned

my Docker setup is showing the following stats for the elasticsearch container:

CPU Usage: 1.95%
Memory Usage: 5.017GiB / 15.25GiB (32.89%)
Block I/O: 3.35GB / 26.1GB

mohab_ghobashy · May 7, 2025, 1:36pm

Why are the searching results showing full precision floating-point values for the vectors, even though the BBQ index configuration should use binary precision?

BenTrent · May 7, 2025, 2:09pm

@mohab_ghobashy we keep the raw floating point values around _source is what you provide to ES.

Having the raw values is important for:

reindex
rescoring via the raw values if desired
Re-quantizing and segment merging.

Usually, there is no good reason to actually return the raw vector client side.

I would augment your search to only specifically include returning the text field.

query = {
  "knn": {...},
  "_source": {"includes": ["my_field"]}
}

this should give you a performance boost as serializing many floating point values is very expensive.

yli · June 27, 2025, 8:22am

Hi @BenTrent ,

thanks for sharing your insights and sorry for injecting questions/comments here.

Keeping raw floating point values around _source seems to be inefficient approach, like discussed in Knn_vectors field understanding - Elastic Stack / Elasticsearch - Discuss the Elastic Stack, right?
If we don't keep raw floating point values around _source, the values are still persist by underlaying Lucene, right? Would it be fine if we fetch values from Lucene for rescoring purpose?
For re-quantizing and segment merging, may I ask whether we indeed need to keep the raw floating point values for any type of quantization? And how the re-quantization works? would be appreciated if you could share any resources.

Thanks a lot and looking forward for your reply.

Best,
Yakun

BenTrent · June 27, 2025, 11:32am

Correct
Correct
We keep the floating point values within Lucene. During the merge process, we will re-quantize given the new centroid of the new segment created during the merge. The only resources on this would be in the format. But to put it simply, we re-read the float32 vectors and quantize them again

yli · June 27, 2025, 1:52pm

Hi @BenTrent,

thanks a lot for sharing the insights.

May I ask whether the raw floating values are needed for all types of quantization? like int8, int4 and BBQ. Would it be fine for you to share some tutorials or resources that explain the re-quantization process? Really appreciate, thanks.

The context I am asking this question is that we have the issue of running into huge storage of those raw floating values, and we would like to check the option to keep only the quantized vector values in storage and delete the raw floating values. For re-ranking, we will use contextual based re-ranking then, which does not require the raw floating values.

BenTrent · June 27, 2025, 2:15pm

We do keep the raw floats for all quantization types at the time being.

Would it be fine for you to share some tutorials or resources that explain the re-quantization process?

There really isn't anything. You can read the Lucene format files if you are curious.

github.com/apache/lucene

lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java

main

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.lucene.codecs.lucene99;

import static org.apache.lucene.codecs.KnnVectorsWriter.MergedVectorValues.hasVectorValues;

This file has been truncated. show original

github.com/apache/lucene

lucene/core/src/java/org/apache/lucene/codecs/lucene102/Lucene102BinaryQuantizedVectorsWriter.java

main

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.apache.lucene.codecs.lucene102;

import static org.apache.lucene.codecs.lucene102.Lucene102BinaryQuantizedVectorsFormat.BINARIZED_VECTOR_COMPONENT;
import static org.apache.lucene.codecs.lucene102.Lucene102BinaryQuantizedVectorsFormat.DIRECT_MONOTONIC_BLOCK_SHIFT;

This file has been truncated. show original

we would like to check the option to keep only the quantized vector values in storage and delete the raw floating values.

There is no way currently to delete the raw vectors and still have Lucene work appropriately.

Note, this is different than having them in _source. _source is a completely different field in Lucene.

yli · July 1, 2025, 2:49pm

@BenTrent, thanks for your explanation.

May I ask about the following statement in [1], whether discard the raw floating point vectors meaning that they are not needed for re-quantization? Thanks a lot.

Furthermore, we anticipate that with 7 bit quantization we should be able to discard the raw floating point vectors and plan to evaluate this thoroughly.

[1] Understanding optimized scalar quantization - Elasticsearch Labs

BenTrent · July 1, 2025, 7:37pm

That is the key part of the quote. It is not implemented yet but it is something we want to eventually implement. Or at least, quantization and compression for the floats at some lower bit size to utilize less disk space.

But, there are only so many hours in the day

yli · July 1, 2025, 10:08pm

Hi @BenTrent , as always, thanks for your quick reply.

It is totally understandable w.r.t the schedule of the implementation and evaluation. From my question, I actually mainly want to clarify the reasoning behind the statement

with 7 bit quantization we should be able to discard the raw floating point vectors

may I ask whether you could elaborate it a bit more? Thanks a lot.

BenTrent · July 2, 2025, 11:36am

Ah, ok.

The main reasoning is that the quantization to that bit size may provide such good accuracy, that the original vectors won't be needed for typical search or merging operations.

yli · July 2, 2025, 2:20pm

@BenTrent thanks, does it mean that for re-quantization it is good enough to use the already quantized 7 bit or it is even not needed to do re-quantization?

BenTrent · July 2, 2025, 3:03pm

@yli it would mean that for "re-quantization" we might still have to adjust the centroid, but it might be possible simply rehydrate (de-quantize), the int7 vectors, then re-quantize given the new centroid.

Centroids might actually shift over the life time of the data, especially for vectors that are near each-other based on their index order (e.g. images from a video).

Topic		Replies	Views
When Does BBQ Quantization Outperform Scalar Quantization Elasticsearch vector-search	1	71	April 23, 2025
Lucene vs elasticsearch file size Elasticsearch	5	397	July 6, 2017
Indices size Elasticsearch	4	616	July 6, 2017
Elasticsearch index MUCH larger then similar lucene index Elasticsearch	54	1330	July 6, 2017
Improving Bulk Indexing Elasticsearch	12	4605	July 6, 2017