Tune on Disk Usage with _source, doc_values and store?

I recently do some working with tune on disk usage following the doc: https://www.elastic.co/guide/en/elasticsearch/reference/6.4/tune-for-disk-usage.html.

Then I tried store those values in those three settings:

  1. with _source=true (doc_values=false, store = false)
  2. with doc_values=true (_source=false, store = false)
  3. with store=true(_source=false, doc_values=false)

With my Testing, I found there's nearly no difference on disk usage with those three settings when the doc values are difference (disk usage got following the doc: https://www.elastic.co/guide/en/elasticsearch/reference/6.4/indices-stats.html)

But in my opinion, doc_values which stores in column-oriented fashion should use less disk spaces. So are there anything wrong?

How much data are you testing with? What is your use case? What is it you are trying to achieve?

my main purpose is to reduce disk usages. So I tried three different kinds of mapping settings. The quantity of Testing data is about 100K.

here are three types of my mapping settings:

  1. enable source, disable doc_values and store
    {
    "settings" : {
    "index.codec" : "best_compression"
    },
    "mappings" : {
    "_source": {
    "enabled": true
    },
    "properties" : {
    "path": {
    "type": "keyword",
    "index": false,
    "doc_values": false,
    "store": false
    }
    }
    }
    }
    }

  2. enable doc_values, disable source and store
    {
    "settings" : {
    "index.codec" : "best_compression"
    },
    "mappings" : {
    "_source": {
    "enabled": false
    },
    "properties" : {
    "path": {
    "type": "keyword",
    "index": false,
    "doc_values": true,
    "store": false
    }
    }
    }
    }
    }

  3. enable store , disable source and doc_values
    {
    "settings" : {
    "index.codec" : "best_compression"
    },
    "mappings" : {
    "_source": {
    "enabled": false
    },
    "properties" : {
    "path": {
    "type": "keyword",
    "index": false,
    "doc_values": false,
    "store": true
    }
    }
    }
    }
    }

100k documents does not sound like a lot of data. I would recommend using a larger data set and make sure that you also index into a single shard as compression efficiency generally improves with shard size. Also force merge down to a single segment once you are done to get a fair comparison.

This old blog post shows how we did a similar comparison for a much older version of Elasticsearch.

It is also worth noting that altering the settings you described will have an impact on how you can query and potentially also reprocess your data, so make sure any side effects of mapping changes are acceptable to your use case.

thanks a lot,I'll try to use much more docs for testing

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.