Tune on Disk Usage with _source, doc_values and store?

dongfeng · November 13, 2018, 4:10am

I recently do some working with tune on disk usage following the doc: https://www.elastic.co/guide/en/elasticsearch/reference/6.4/tune-for-disk-usage.html.

Then I tried store those values in those three settings:

with _source=true (doc_values=false, store = false)
with doc_values=true (_source=false, store = false)
with store=true(_source=false, doc_values=false)

With my Testing, I found there's nearly no difference on disk usage with those three settings when the doc values are difference (disk usage got following the doc: https://www.elastic.co/guide/en/elasticsearch/reference/6.4/indices-stats.html)

But in my opinion, doc_values which stores in column-oriented fashion should use less disk spaces. So are there anything wrong?

Christian_Dahlqvist · November 13, 2018, 7:37am

How much data are you testing with? What is your use case? What is it you are trying to achieve?

dongfeng · November 13, 2018, 8:18am

my main purpose is to reduce disk usages. So I tried three different kinds of mapping settings. The quantity of Testing data is about 100K.

here are three types of my mapping settings:

enable source, disable doc_values and store
{
"settings" : {
"index.codec" : "best_compression"
},
"mappings" : {
"_source": {
"enabled": true
},
"properties" : {
"path": {
"type": "keyword",
"index": false,
"doc_values": false,
"store": false
}
}
}
}
}
enable doc_values, disable source and store
{
"settings" : {
"index.codec" : "best_compression"
},
"mappings" : {
"_source": {
"enabled": false
},
"properties" : {
"path": {
"type": "keyword",
"index": false,
"doc_values": true,
"store": false
}
}
}
}
}
enable store , disable source and doc_values
{
"settings" : {
"index.codec" : "best_compression"
},
"mappings" : {
"_source": {
"enabled": false
},
"properties" : {
"path": {
"type": "keyword",
"index": false,
"doc_values": false,
"store": true
}
}
}
}
}

Christian_Dahlqvist · November 13, 2018, 8:28am

100k documents does not sound like a lot of data. I would recommend using a larger data set and make sure that you also index into a single shard as compression efficiency generally improves with shard size. Also force merge down to a single segment once you are done to get a fair comparison.

This old blog post shows how we did a similar comparison for a much older version of Elasticsearch.

It is also worth noting that altering the settings you described will have an impact on how you can query and potentially also reprocess your data, so make sure any side effects of mapping changes are acceptable to your use case.

dongfeng · November 13, 2018, 11:42am

thanks a lot，I'll try to use much more docs for testing

system · December 11, 2018, 11:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Source and Store Level Compression Elasticsearch	2	1295	July 6, 2017
Elasticsearch disk usage 1.x vs 2.x Elasticsearch	3	624	July 5, 2017
What's the difference between "store" and "doc_values" in field properties? Elasticsearch	7	7902	January 24, 2017
Same space usage when disabling storing the source document Elasticsearch	2	110	April 30, 2024
Elasticsearch 2.0 2.5X Disk Space Elasticsearch	4	1172	July 5, 2017

Tune on Disk Usage with _source, doc_values and store?

Related topics