Based on my understanding of your comments, that the raw value of dense vector field will be stored in Lucene and again in _source
if not excluded, so there is a duplication, and this resulted additional 40 GB storage on disk, right? How the configuration of 1 replica influences the storage additionally?
What about the quantized vectors, will they be duplicated as well in replica shards?
I attached partially the response from the disk usage api
{
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"file_flat_1024": {
"store_size": "216.5gb",
"store_size_in_bytes": 232469055909,
"all_fields": {
"total": "216.4gb",
"total_in_bytes": 232444165546,
"inverted_index": {
"total": "2.9gb",
"total_in_bytes": 3147344422
},
"stored_fields": "163.7gb",
"stored_fields_in_bytes": 175830555711,
"doc_values": "97.1mb",
"doc_values_in_bytes": 101829561,
"points": "92.2mb",
"points_in_bytes": 96760144,
"norms": "9.8mb",
"norms_in_bytes": 10375998,
"term_vectors": "0b",
"term_vectors_in_bytes": 0,
"knn_vectors": "49.5gb",
"knn_vectors_in_bytes": 53257299710
},
"fields": {
"_source": {
"total": "163.5gb",
"total_in_bytes": 175628654805,
"inverted_index": {
"total": "0b",
"total_in_bytes": 0
},
"stored_fields": "163.5gb",
"stored_fields_in_bytes": 175628654805,
"doc_values": "0b",
"doc_values_in_bytes": 0,
"points": "0b",
"points_in_bytes": 0,
"norms": "0b",
"norms_in_bytes": 0,
"term_vectors": "0b",
"term_vectors_in_bytes": 0,
"knn_vectors": "0b",
"knn_vectors_in_bytes": 0
},
"file_section_embedding": {
"total": "49.5gb",
"total_in_bytes": 53257299710,
"inverted_index": {
"total": "0b",
"total_in_bytes": 0
},
"stored_fields": "0b",
"stored_fields_in_bytes": 0,
"doc_values": "0b",
"doc_values_in_bytes": 0,
"points": "0b",
"points_in_bytes": 0,
"norms": "0b",
"norms_in_bytes": 0,
"term_vectors": "0b",
"term_vectors_in_bytes": 0,
"knn_vectors": "49.5gb",
"knn_vectors_in_bytes": 53257299710
}
}
}
}
Btw, from the response of disk usage API, it seems that it returns the disk usage based on the primary shards, right?