I am testing ES 5 now and copying some indexes from a live ES 2.2.2 to my new test ES 5.0.1 using escp to dump from old and import into new.
I noticed straight away that on the es 5.0.1 the same index is using way more storage then on 2.2.2
OLD: 3.6mb Primary Store Size
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open ossec-2016.04.20 4 1 7688 0 7.2mb 3.6mb
NEW: 8.2mb
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open ossec-2016.04.20 9563yC4rT7eY7_bPqWZvdw 4 1 7688 0 16.4mb 8.2b
I checked the documents using elasticdump to make sure something weird was not happening with the actual documents but they seem to be completely the same to me
There are lots of different default options in 5.0. I expect some of those are eating space. For example, dynamically created string fields are both analyzed as the text type and not analyzed as the keyword type. Elasticsearch 5.0's defaults are very "enable all the things, paying all the costs". I'd fiddle with the mappings to get one that makes sense for you.
That is a very different number though. More different than I'd expect. It is kind of hard to know why. Two things I'd do:
Index more data. megabytes are big but they don't amount to a ton on the scale Elasticsearch thinks on. 100mb of index wouldn't take too long.
Compare the mappings.
Compare the number of segments. Was the old system _force_merged?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.