Hi all,
I’m experimenting with the new vector quantization formats in Elasticsearch 8.x and trying to figure out at what dataset size BBQ (binary quantization) really starts to outperform scalar quantization.
What I’m comparing:
- ES814HnswScalarQuantizedVectorsFormat (Non BBQ)
- ES816HnswBinaryQuantizedVectorsFormat (BBQ)
Both are used on dense_vector
fields with the HNSW index, and my goal is to reduce memory and disk usage without sacrificing too much recall or latency.
My current mapping
{
"properties": {
"my_vector": {
"type": "dense_vector",
"dims": 768,
"index": true,
"similarity": "cosine",
"index_options": {
"type": "bbq_hnsw"
}
}
}
}
- Roughly how many vector documents (e.g. 100 K, 1 M, 10 M) you needed before seeing a noticeable improvement in query latency or resource usage with BBQ vs. scalar quantization.
- Any trade‑offs in recall or accuracy you observed at various scales.
- Tips on benchmarking methodology (e.g. how many query patterns to test, warm‑up strategies, monitoring settings).
Has anyone here benchmarked both formats in prod or at scale? What dataset sizes or traffic volumes tipped the balance in favor of BBQ for you?
Thanks in advance for any insights or numbers you can share!