I've been running some tests on Better Binary Quantization (BBQ) in Elasticsearch and comparing it with the default configuration for dense vectors, but I'm not observing the expected differences in disk size or search performance.
Test Setup:
BBQ Index Configuration (my-index
):
{
"mappings": {
"properties": {
"vector": {
"type": "dense_vector",
"dims": 1024,
"index_options": {
"type": "bbq_hnsw",
"m": 16,
"ef_construction": 100
}
}
}
},
"settings": {
"index": {
"number_of_shards": "1",
"number_of_replicas": "1"
}
}
}
Default Index Configuration (my-index-2
):
{
"mappings": {
"properties": {
"vector": {
"type": "dense_vector",
"dims": 1024,
"index_options": {
"type": "int8_hnsw",
"m": 16,
"ef_construction": 100
}
}
}
},
"settings": {
"index": {
"number_of_shards": "1",
"number_of_replicas": "1"
}
}
}
Problem:
I embedded 100K comments, each with 1024 dimensions in each vector (IMS), and tested both configurations. However, the disk usage and search time appear to be almost identical, with no significant improvements from the BBQ configuration. In fact, sometimes the default configuration seems faster in terms of search time.
Index Sizes:
- BBQ Index (
my-index
): 1.9GB - Default Index (
my-index-2
): 1.9GB
As shown, there is no difference in disk size between the two configurations.
Questions:
- Index Size Comparison: How can I accurately measure the size of each index (BBQ vs Default) to check for differences in disk usage?
- Performance Differences: Has anyone encountered similar results? What settings or tests can I adjust to identify any potential improvements with BBQ?