When Does BBQ Quantization Outperform Scalar Quantization

Hi all,

I’m experimenting with the new vector quantization formats in Elasticsearch 8.x and trying to figure out at what dataset size BBQ (binary quantization) really starts to outperform scalar quantization.

What I’m comparing:

  • ES814HnswScalarQuantizedVectorsFormat (Non BBQ)
  • ES816HnswBinaryQuantizedVectorsFormat (BBQ)

Both are used on dense_vector fields with the HNSW index, and my goal is to reduce memory and disk usage without sacrificing too much recall or latency.

My current mapping

{
  "properties": {
    "my_vector": {
      "type": "dense_vector",
      "dims": 768,
      "index": true,
      "similarity": "cosine",
      "index_options": {
          "type": "bbq_hnsw"
            }
    }
  }
}
  • Roughly how many vector documents (e.g. 100 K, 1 M, 10 M) you needed before seeing a noticeable improvement in query latency or resource usage with BBQ vs. scalar quantization.
  • Any trade‑offs in recall or accuracy you observed at various scales.
  • Tips on benchmarking methodology (e.g. how many query patterns to test, warm‑up strategies, monitoring settings).

Has anyone here benchmarked both formats in prod or at scale? What dataset sizes or traffic volumes tipped the balance in favor of BBQ for you?

Thanks in advance for any insights or numbers you can share!

Hey there, unfortunately it doesn't look like BBQ is published on our nightly benchmarks (yet) but we have performed benchmarking using datasets such as openai-vector.

We also have published this blog including a detailed Rally configuration that was used for some benchmarking if you wanted to replicate it yourself.

If you are experimenting with BBQ we would strongly suggest oversampling and rescoring. We've internally seen good recall results with a rescore value of 3.0 if you're interested in experimenting with this.

Good luck - and we'd love your feedback if you start using it!

1 Like