What is taking up disk space if I disable indexing of all properties and _source?

Adnukator · February 28, 2022, 5:45am

I'm trying to restructure our Elasticsearch settings to be more space efficient, even if at the price of less comfort. I have a set of 150k documents from a live server where I'm trying to measure the impact of different settings. While trying to find the lowest possible space requirements (and admittedly, making the stored data basically useless), I tried disabling the indexing of all properties of the message and I disabled _source as well. However, every message still takes up about 300 bytes of space, resulting in 40MB consumed space with 150k "empty" documents (according to _cat/indices). If it makes any difference, I'm using elasticdump to move documents in 10k batches. What exactly is being stored? Can I remove this overhead somehow?

If I do a search, every document looks just like this:
{
"_index" : "index20",
"_id" : "Pz9NMH8BsRzEHi_3CFJ-",
"_score" : 1.0
},

This is how I set up the index before pushing in data:
{
"mappings":{
"_source":{"enabled":false},
"dynamic":"false",
"properties":{
"@timestamp":{"enabled":false},
"@version":{"enabled":false},
"headers":{"enabled":false},
"host":{"enabled":false},
"message":{"enabled":false},
"tags":{"enabled":false}
}
}
}

According to _stats, these are the largest parts of the index
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size_in_bytes" : 0,
"total" : 1,
"total_time_in_millis" : 356,
"total_docs" : 100000,
"total_size_in_bytes" : 28525466,
"total_stopped_time_in_millis" : 0,
"total_throttled_time_in_millis" : 0,
"total_auto_throttle_in_bytes" : 20971520
},
"bulk" : {
"total_operations" : 15,
"total_time_in_millis" : 4886,
"total_size_in_bytes" : 195480032,
"avg_time_in_millis" : 261,
"avg_size_in_bytes" : 10347232
}

_disk_usage reports this as the biggest culprit:
"fields": {
"_recovery_source": {
"total": "37mb",
"total_in_bytes": 38837261,
"inverted_index": {
"total": "0b",
"total_in_bytes": 0
},
"stored_fields": "37mb",
"stored_fields_in_bytes": 38837261,

warkolm · February 28, 2022, 6:46am

What's the point in storing it in Elasticsearch if you want to do this?

Adnukator · February 28, 2022, 6:56am

I wanted to remove _source from the data completely and store only a few integers + timestamp per message. I didn't understand why the reported stored data was still inexplicably way too large. After gradually removing indexed data to find where the extra space was coming from, I ended up in the state I am now. So basically, this is not what I want to use, it's just a data point in my measurements that I want to understand.

system · March 28, 2022, 6:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Same space usage when disabling storing the source document Elasticsearch	2	110	April 30, 2024
ELK cluster disk space usage optimization Elasticsearch	9	2467	July 5, 2017
Some interesting storage numbers for people interested Elasticsearch	7	390	July 6, 2017
ElasticSearch index size peculiarity Elasticsearch	2	661	July 6, 2017
How to reduce Index size on disk? Elasticsearch	7	16115	July 5, 2017

What is taking up disk space if I disable indexing of all properties and _source?

Related topics