I am trying to integrate BBQ, specifically bbq_hnsw
into my existing index. However I'm not seeing the performance benefits I would expect.
{
"type": "dense_vector",
"dims": dims,
"index": True,
"similarity": "cosine",
"index_options": {
"type": "bbq_hnsw"
}
}
While I see that the attribute exists in the lucene _segments
I see no improvement in my query time.
{
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"indices": {
"embedding_control_20250417_qaqkj3cw": {
"shards": {
"0": [
{
"routing": {
"state": "STARTED",
"primary": true,
"node": "IhQGlYPITDmtzYIEHHIwLg"
},
"num_committed_segments": 1,
"num_search_segments": 1,
"segments": {
"_16": {
"generation": 42,
"num_docs": 2172,
"deleted_docs": 0,
"size": "35.5mb",
"size_in_bytes": 37258972,
"committed": true,
"search": true,
"version": "9.12.0",
"compound": true,
"attributes": {
"Lucene99HnswVectorsFormat": "[ibmGraniteEmbedding]",
"Zstd814StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
}
}
}
]
}
}
}
}
This is how my current query is setup which is currently working but I see no difference in speed when comparing a build that was embedded without the index_options
. The build is 2,000 records and the model dims are 768.
"query": {
"function_score": {
"query": {
"script_score": {
"query": bool_query,
"script": {
"source": f"if (doc['{self.model_info['field_name']}'].isEmpty()) {{ return 0.0; }} else {{ return cosineSimilarity(params.query_vector, '{self.model_info['field_name']}') + 1.0; }}",
"params": {
"query_vector": embedded_query
}
}
}
},