Dense Vector Field Extremely Large

john-wagster · October 4, 2025, 11:08am

I had a thought last night. I thought wait what if Lucene is dedupping the vector and I just didn’t realize it could do that. As in if you loaded two identical vectors into two separate fields would we detect and not store the raw vectors twice!

I tested that too. And as you might expect we don’t dedup those. This is fun though.

mapping

curl -XPUT --header 'Content-Type: application/json' "http://localhost:9200/test" -d '{
  "mappings": {
    "properties": {
      "image-vector": {
        "type": "dense_vector",
        "dims": 64,
        "similarity": "l2_norm",
        "index": true,
        "index_options": {
          "type": "bbq_hnsw"
        }
      },
      "image-vector2": {
        "type": "dense_vector",
        "dims": 64,
        "similarity": "l2_norm",
        "index": true,
        "index_options": {
          "type": "int8_hnsw"
        }
      }      
    }
  }
}'

adding docs:

VECTOR=$(python -c 'import numpy as np; print(np.random.random(64).tolist())');
seq 1 10000 | xargs -I % -P1 curl -XPOST --header 'Content-Type: application/json' "http://localhost:9200/test/_doc" -d "
    { \"image-vector\": $VECTOR,
	  \"image-vector2\": $VECTOR }
"

relevant output of disk_usage:

            "image-vector": {
                "total": "2.6mb",
                "total_in_bytes": 2801631,
                "inverted_index": {
                    "total": "0b",
                    "total_in_bytes": 0
                },
                "stored_fields": "0b",
                "stored_fields_in_bytes": 0,
                "doc_values": "0b",
                "doc_values_in_bytes": 0,
                "points": "0b",
                "points_in_bytes": 0,
                "norms": "0b",
                "norms_in_bytes": 0,
                "term_vectors": "0b",
                "term_vectors_in_bytes": 0,
                "knn_vectors": "2.6mb",
                "knn_vectors_in_bytes": 2801631
            },
            "image-vector2": {
                "total": "3.1mb",
                "total_in_bytes": 3261630,
                "inverted_index": {
                    "total": "0b",
                    "total_in_bytes": 0
                },
                "stored_fields": "0b",
                "stored_fields_in_bytes": 0,
                "doc_values": "0b",
                "doc_values_in_bytes": 0,
                "points": "0b",
                "points_in_bytes": 0,
                "norms": "0b",
                "norms_in_bytes": 0,
                "term_vectors": "0b",
                "term_vectors_in_bytes": 0,
                "knn_vectors": "3.1mb",
                "knn_vectors_in_bytes": 3261630
            }

math:

# bbq_hnsw
10_000 * (64/8+14) + 10_000 * 16 + 10_000 * 64 * 4 = 2940000

# int8_hnsw
10_000 * 64 + 10_000 * 16 + 10_000 * 64 * 4 = 3360000

Topic		Replies	Views
Vector search large dense vectors performance issues Elasticsearch vector-search	3	165	July 16, 2025
Knn_vectors field understanding Elasticsearch vector-search	23	536	March 6, 2025
Dense vector disk size Elasticsearch vector-search	1	123	August 28, 2024
Understanding Storage Overhead in Elasticsearch for Vector Data Elasticsearch vector-search	3	184	August 5, 2024
Would int8_hnsw slower than hnsw for vector search Elasticsearch vector-search	5	234	July 23, 2025

Dense Vector Field Extremely Large

Related topics