Elasticsearch disk usage 1.x vs 2.x

(NIck L.) #1

Hi everyone - I'm hoping someone can shed some light on a stumper for me.

I've currently been migrating our log data from an ES 1.4 cluster to a shiny new 2.3 cluster. I'm pushing in quite a bit of Logstash data (150M+ documents per day) and I've noticed a huge discrepancy between the disk usage per index between 1.4 and 2.3. A daily index was ~200G on 1.4 and is over 450G on 2.x. I've made sure my templates are the same as previous versions and the mappings look fairly similar.

I've searched and searched but can't find any particular information on this - is there any known reason stores would be significantly larger on 2.x vs 1.x?

Thanks so much!

(Aaron Mildenstein) #2

A significant portion of that may be doc_values being on by default in 2.x:

All fields which support doc values have them enabled by default. If you are sure that you don’t need to sort or aggregate on a field, or access the field value from a script, you can disable doc values in order to save disk space

(NIck L.) #3

Thanks so much! Disabling them definitely made an impact.

(system) #4