Disk space consumption of Elasticsearch 5.3

hi all,
few days ago, I have upgraded from Elasticsearch 2.3.3 to 5.3.0. While the upgrade went well, I have noticed increased disk space consumption -- my daily indices are now 1.6-1.7 times larger on disk. Is this something that one should expect when running Elasticsearch 5? I have tried the "index.codec": "best_compression" setting, but it doesn't seem to help much. My installation is used for collecting log data and since I have lot of fields, I have set index.mapping.total_fields.limit to 20000. Can this possibly have an adverse effect on index size? If not, are there any recommendations for reducing disk space consumption?
kind regards,
risto

Having such a large number of fields can have an impact on the size the data takes up on disk, especially if you have lot of sparse fields. As per the link, it is generally recommended to avoid high levels of sparsity.

Try grouping data that is similar in structure in the same index. If you e.g. have some type of data that has a limited number of fields and come in large volumes, it may make sense to put this in a separate index as it reduces the amount of data that suffers from sparsity.

The default mapping type that Elasticsearch infers when it sees a new string field changed in 5.x to something that takes up more space but allows more features. I'd compare the mappings. If the 5.x ones have a lot of

"field_name": {
  "type": "text",
  "fields": {
    "keyword": {
      "type": "keyword",
      "ignore_above": 256
    }
  }
}

Then it is worth customizing the analysis of these text fields. If you don't need to sort on the field or aggregate on it then turn off the keyword field. There are a lot of mapping optimization things you can get into, but this is the first thing I'd look at.

You can use an index template to make sure new indexes are set up the way you want them to be. See if they are smaller.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.