What is _recovery_source field?

I have initiated indexing operation via python client that was abnormally terminated. When I ran

curl --location --request POST 'http://127.0.0.1:9200/index_name.vec/_disk_usage?run_expensive_tasks=true

to analyze the current index, I noticed that there is a _recovery_source field that based on its size contains raw dense_vector field data in float representation taking a lot of space.

"index_vector": {
                "store_size": "29.3mb",
                "all_fields": {
                    "total": "29.3mb",
                    "inverted_index": {
                        "total": "50.4kb",
                    },
                "stored_fields": "23.2mb",
                "doc_values": "286.6kb",
                "points": "14.2kb",
                "norms": "1.3kb",
                "term_vectors": "0b",
                "knn_vectors": "5.7mb",
        }

"_recovery_source": {
                "total": "21.8mb",
                "inverted_index": {
                    "total": "0b",
                },
                "stored_fields": "21.8mb",
                "doc_values": "0b",
                "points": "0b",
                "norms": "0b",
                "term_vectors": "0b",
                "knn_vectors": "0b",

 "_source": {
                "total": "1.3mb",
                "inverted_index": {
                    "total": "0b",
                },
                "stored_fields": "1.3mb",
                "doc_values": "0b",
                "points": "0b",
                "norms": "0b",
                "term_vectors": "0b",
                "knn_vectors": "0b",

Based on its name I can guess that this field is used for some recovery operations on the index but if it is storing raw dense vectors without any compression, then it's going to take a lot of space.
See:

So I was wondering what this field is used for and how can I disable it or at least disable dense vectors from being included in this field?

Recovery source is SUPPOSED to go away after segments are merged.

See: Indices with "_source.enabled: false" same size as indices with "_source.enabled: true" · Issue #41628 · elastic/elasticsearch · GitHub

It is effectively use by cross-cluster replication and shard disaster recoveries.

Unfortunately it stays:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.