Elasticsearch 5 indexes much bigger then on the old 2.2.2?

I am testing ES 5 now and copying some indexes from a live ES 2.2.2 to my new test ES 5.0.1 using escp to dump from old and import into new.

I noticed straight away that on the es 5.0.1 the same index is using way more storage then on 2.2.2

OLD: 3.6mb Primary Store Size

health status index pri rep docs.count docs.deleted store.size pri.store.size
green open ossec-2016.04.20 4 1 7688 0 7.2mb 3.6mb

NEW: 8.2mb

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open ossec-2016.04.20 9563yC4rT7eY7_bPqWZvdw 4 1 7688 0 16.4mb 8.2b

I checked the documents using elasticdump to make sure something weird was not happening with the actual documents but they seem to be completely the same to me

./elasticdump --input=http://OLD-2.2.2:9200/ossec-2016.04.20 --output=/tmp/file.json >/dev/null && cat /tmp/file.json | sort | md5sum
83a5e0d491cdb447bd48e76a923b4a3b -

./elasticdump --input=http://NEW-5.0.1:9200/ossec-2016.04.20 --output=/tmp/file.json >/dev/null && cat /tmp/file.json | sort | md5sum
83a5e0d491cdb447bd48e76a923b4a3b -

Is this something to do with compression that i don't seem to be able to find ?
Or is it due to changes in metadata in lucene 6 in ES 5.0.1 ?

Thanks

There are lots of different default options in 5.0. I expect some of those are eating space. For example, dynamically created string fields are both analyzed as the text type and not analyzed as the keyword type. Elasticsearch 5.0's defaults are very "enable all the things, paying all the costs". I'd fiddle with the mappings to get one that makes sense for you.

That is a very different number though. More different than I'd expect. It is kind of hard to know why. Two things I'd do:

  1. Index more data. megabytes are big but they don't amount to a ton on the scale Elasticsearch thinks on. 100mb of index wouldn't take too long.
  2. Compare the mappings.
  3. Compare the number of segments. Was the old system _force_merged?

Hi Nik

You were right and it was about the mapping, or rather the missing template that i forgot to push before copying the index.

Without template it must have used the "enable all the things, paying all the costs" and so got me such a big index.

After pushing my template from the old to the new one the size is now almost identical, 300K bigger but i can live with that.

Thanks

Awesome! 300k is totally in the range of noise I'd expect from different timings around the import process.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.