We utilize Elasticsearch primarily for Logstash data and with the recent upgrade to 2.0 we saw about a 2.5X disk space increase. Same data, same data types and fields.
logstash-2015.11.10 - 179.3m - 139.6GB
logstash-2015.11.16 - 174.3m - 411.5GB
Not so much a question but this is likely due to the Doc Values turned on for all non-analyzed fields by default where as before we were only use Doc Values for the @timestamp field.
We have the same results with ES 2.0. Now in research if we really needed doc_values or not. Right now I don't see any benefit of it, because we don't use much aggregations or sorting for stored logs data. Just waste of disk space.
Did you try to use settings.index.codec: "best_compression" for new indexes? It can help you to minimize impact of doc_values.
Thanks for confirmation that you saw the same thing rusty. It's good to know that it wasn't just something in our cluster that went crazy. It is pretty nice how much more space is available on the heap but it is a huge increase in disk space usage.
I haven't tried the settings.index.codec: "best_compression" yet but was looking into it yesterday. Based on how this is looking it's probably a worthwhile change for me to do this weekend.
It certainly can increase disk space usage (if you didn't have doc values turned on before). However, many people are memory bound on their machines, and all that stuff that you now have on disk used to be in memory, which is usually much more pricey. Also, by moving these out of memory and onto disk, some systems will be much more stable by avoiding out of memory exceptions. But each system is different and sometimes you have gobs of memory and no disk space left, so definitely something to be aware of!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.