No Observable Difference Between BBQ and Default Configurations in Elasticsearch – Help with Index Size Comparison

I've been running some tests on Better Binary Quantization (BBQ) in Elasticsearch and comparing it with the default configuration for dense vectors, but I'm not observing the expected differences in disk size or search performance.

Test Setup:

BBQ Index Configuration (my-index):

{
  "mappings": {
    "properties": {
      "vector": {
        "type": "dense_vector",
        "dims": 1024,
        "index_options": {
          "type": "bbq_hnsw",
          "m": 16,
          "ef_construction": 100
        }
      }
    }
  },
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "1"
    }
  }
}

Default Index Configuration (my-index-2):

{
  "mappings": {
    "properties": {
      "vector": {
        "type": "dense_vector",
        "dims": 1024,
        "index_options": {
          "type": "int8_hnsw",
          "m": 16,
          "ef_construction": 100
        }
      }
    }
  },
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "1"
    }
  }
}

Problem:

I embedded 100K comments, each with 1024 dimensions in each vector (IMS), and tested both configurations. However, the disk usage and search time appear to be almost identical, with no significant improvements from the BBQ configuration. In fact, sometimes the default configuration seems faster in terms of search time.

Index Sizes:

  • BBQ Index (my-index): 1.9GB
  • Default Index (my-index-2): 1.9GB

As shown, there is no difference in disk size between the two configurations.

Questions:

  • Index Size Comparison: How can I accurately measure the size of each index (BBQ vs Default) to check for differences in disk usage?
  • Performance Differences: Has anyone encountered similar results? What settings or tests can I adjust to identify any potential improvements with BBQ?

Hi @mohab_ghobashy , welcome to our community.

Have you read this articles?

Hey @mohab_ghobashy

The disk footprint is dominated by the raw floating point vectors.

How did you determine your disk footprint? (which API, or looking directly at the directory, etc.)

For performance differences, its useful to know the queries utilized (the entire search request), the ES version, and the hardware on which its tested.

Also, how are you measuring search time? Is this the 'took' time in the request or measured client side?

Hey @BenTrent,

Thanks for your thoughts!

I’ve been using the GET /_cat/indices/full-precision-index?v command to track disk usage, and I also rely on the GET /_stats/store API to get a closer look at the storage details.

I’ve been checking the took time on the client side like you mentioned

my Docker setup is showing the following stats for the elasticsearch container:

  • CPU Usage: 1.95%
  • Memory Usage: 5.017GiB / 15.25GiB (32.89%)
  • Block I/O: 3.35GB / 26.1GB

Why are the searching results showing full precision floating-point values for the vectors, even though the BBQ index configuration should use binary precision?

@mohab_ghobashy we keep the raw floating point values around _source is what you provide to ES.

Having the raw values is important for:

  • reindex
  • rescoring via the raw values if desired
  • Re-quantizing and segment merging.

Usually, there is no good reason to actually return the raw vector client side.

I would augment your search to only specifically include returning the text field.

query = {
  "knn": {...},
  "_source": {"includes": ["my_field"]}
}

this should give you a performance boost as serializing many floating point values is very expensive.

Hi @BenTrent ,

thanks for sharing your insights and sorry for injecting questions/comments here.

  1. Keeping raw floating point values around _source seems to be inefficient approach, like discussed in Knn_vectors field understanding - Elastic Stack / Elasticsearch - Discuss the Elastic Stack, right?
  2. If we don't keep raw floating point values around _source, the values are still persist by underlaying Lucene, right? Would it be fine if we fetch values from Lucene for rescoring purpose?
  3. For re-quantizing and segment merging, may I ask whether we indeed need to keep the raw floating point values for any type of quantization? And how the re-quantization works? would be appreciated if you could share any resources.

Thanks a lot and looking forward for your reply.

Best,
Yakun

  1. Correct
  2. Correct
  3. We keep the floating point values within Lucene. During the merge process, we will re-quantize given the new centroid of the new segment created during the merge. The only resources on this would be in the format. But to put it simply, we re-read the float32 vectors and quantize them again

Hi @BenTrent,

thanks a lot for sharing the insights.

May I ask whether the raw floating values are needed for all types of quantization? like int8, int4 and BBQ. Would it be fine for you to share some tutorials or resources that explain the re-quantization process? Really appreciate, thanks.

The context I am asking this question is that we have the issue of running into huge storage of those raw floating values, and we would like to check the option to keep only the quantized vector values in storage and delete the raw floating values. For re-ranking, we will use contextual based re-ranking then, which does not require the raw floating values.

We do keep the raw floats for all quantization types at the time being.

Would it be fine for you to share some tutorials or resources that explain the re-quantization process?

There really isn't anything. You can read the Lucene format files if you are curious.

we would like to check the option to keep only the quantized vector values in storage and delete the raw floating values.

There is no way currently to delete the raw vectors and still have Lucene work appropriately.

Note, this is different than having them in _source. _source is a completely different field in Lucene.

@BenTrent, thanks for your explanation.

May I ask about the following statement in [1], whether discard the raw floating point vectors meaning that they are not needed for re-quantization? Thanks a lot.

Furthermore, we anticipate that with 7 bit quantization we should be able to discard the raw floating point vectors and plan to evaluate this thoroughly.

[1] Understanding optimized scalar quantization - Elasticsearch Labs

That is the key part of the quote. It is not implemented yet but it is something we want to eventually implement. Or at least, quantization and compression for the floats at some lower bit size to utilize less disk space.

But, there are only so many hours in the day :slight_smile:

Hi @BenTrent , as always, thanks for your quick reply.

It is totally understandable w.r.t the schedule of the implementation and evaluation. From my question, I actually mainly want to clarify the reasoning behind the statement

with 7 bit quantization we should be able to discard the raw floating point vectors

may I ask whether you could elaborate it a bit more? Thanks a lot.

Ah, ok.

The main reasoning is that the quantization to that bit size may provide such good accuracy, that the original vectors won't be needed for typical search or merging operations.

@BenTrent thanks, does it mean that for re-quantization it is good enough to use the already quantized 7 bit or it is even not needed to do re-quantization?

@yli it would mean that for "re-quantization" we might still have to adjust the centroid, but it might be possible simply rehydrate (de-quantize), the int7 vectors, then re-quantize given the new centroid.

Centroids might actually shift over the life time of the data, especially for vectors that are near each-other based on their index order (e.g. images from a video).