I have initiated indexing operation via python client that was abnormally terminated. When I ran
curl --location --request POST 'http://127.0.0.1:9200/index_name.vec/_disk_usage?run_expensive_tasks=true
to analyze the current index, I noticed that there is a _recovery_source
field that based on its size contains raw dense_vector
field data in float representation taking a lot of space.
"index_vector": {
"store_size": "29.3mb",
"all_fields": {
"total": "29.3mb",
"inverted_index": {
"total": "50.4kb",
},
"stored_fields": "23.2mb",
"doc_values": "286.6kb",
"points": "14.2kb",
"norms": "1.3kb",
"term_vectors": "0b",
"knn_vectors": "5.7mb",
}
"_recovery_source": {
"total": "21.8mb",
"inverted_index": {
"total": "0b",
},
"stored_fields": "21.8mb",
"doc_values": "0b",
"points": "0b",
"norms": "0b",
"term_vectors": "0b",
"knn_vectors": "0b",
"_source": {
"total": "1.3mb",
"inverted_index": {
"total": "0b",
},
"stored_fields": "1.3mb",
"doc_values": "0b",
"points": "0b",
"norms": "0b",
"term_vectors": "0b",
"knn_vectors": "0b",
Based on its name I can guess that this field is used for some recovery operations on the index but if it is storing raw dense vectors without any compression, then it's going to take a lot of space.
See:
So I was wondering what this field is used for and how can I disable it or at least disable dense vectors from being included in this field?