Index Size: Elasticsearch 1.4 -> 2.1


(Alexander Ott) #1

We currently switched from ES 1.4 to ES 2.1. After creation of our indices with ES 2.1 the size of them are more than twice bigger than with ES 1.4. Any hints about this effect or has anybody made the same observation?


Elasticsearch Indices and Space
(David Pilato) #2

Probably because of doc_values which are activated by default?


(Alexander Ott) #3

And in ES 1.4 doc_values weren't activated by default?


#4

Look at Elasticsearch 2.0 2.5X Disk Space seems to be same problem.


(Alexander Ott) #5

Thanks. I will check this.


(David Pilato) #6

No they were not.


(Alexander Ott) #7

Is it possible to disable doc_values by default as it was in 1.x? Or do we need to disable it on every field in our mapping now?


(David Pilato) #8

So you have not_analyzed fields but you don't use them for sorting or aggregations?

Yes you have to define this for every field but you can use Dynamic Templates.


(Alexander Ott) #9

Now i changed all our fields to doc_values: false. But it seems that it doesn't work... Index size with ES.1.4 = 69GB index size with 2.1 = 172GB. May you can have a look at our mapping? Is it possible to upload it here. Upload function only allowes jpg, jpeg, png, gif :frowning:


(David Pilato) #10

You can pretty format it and copy and paste here if small.

If not, paste it on gist.github.com


(Alexander Ott) #11

Please have a look at https://gist.github.com/anonymous/c0ab1a97d655322cde55


(David Pilato) #12

Is it the same mapping you have for your 1.4 version?
The mapping looks good.

Did you index exactly the same data?


(Alexander Ott) #13

The mapping is the same up to "doc_values: false" and under
"_source": {
"enabled": true,
"compressed": true
} i removed the "compressed": true cause it isn't supported anymore and compression is enabled by default. isn't it?

The data is exactly the same data and this is what makes me wonder. It seems that the doc_values: false configuration dosen't work....


(David Pilato) #14

@jpountz Any idea about this?


(Adrien Grand) #15

Alexandre, can you break down the size of your data directory by file extension for both versions? (eg. how much disk space are using the .fdt files, .dvm files, .tim files, etc.)


(Alexander Ott) #16

At https://gist.github.com/anonymous/120f63fbad5939febd92 you can find 4 files.

The first two files list the files and size inside the data directory of each es 1.4 node.

The last two files list the files and size inside the data directory of each es 2.1 node.


(Adrien Grand) #17

The main problem seems to be due to the fact that you have sparse analyzed string fields, ie. fields that are only present in a minority of documents. Norms were entirely stored in memory up to 2.0 included, which could occasionally take a lot of memory. In 2.1 norms have been moved to disk in order to reduce the memory requirements of Elasticsearch. However, while the new encoding requires much less memory (no memory at all actually), it also requires more disk space in the case that fields are sparse.

We can look into better compressing norms on sparse fields, but this is not something that would come for free, in particular performance would be affected, and this would take quite some time to be released.

However on your end, you could look into modeling your documents in such a way that you have fewer sparse analyzed string fields. Another option is to disable norms on your string fields if you don't need scoring.


(system) #18