Index Size: Elasticsearch 1.4 -> 2.1

Alexander_Ott · December 3, 2015, 7:50am

We currently switched from ES 1.4 to ES 2.1. After creation of our indices with ES 2.1 the size of them are more than twice bigger than with ES 1.4. Any hints about this effect or has anybody made the same observation?

dadoonet · December 3, 2015, 8:01am

Probably because of doc_values which are activated by default?

Alexander_Ott · December 3, 2015, 8:27am

And in ES 1.4 doc_values weren't activated by default?

rusty · December 3, 2015, 8:27am

Look at Elasticsearch 2.0 2.5X Disk Space seems to be same problem.

Alexander_Ott · December 3, 2015, 8:28am

Thanks. I will check this.

dadoonet · December 3, 2015, 8:55am

No they were not.

Alexander_Ott · December 7, 2015, 8:53am

Is it possible to disable doc_values by default as it was in 1.x? Or do we need to disable it on every field in our mapping now?

dadoonet · December 7, 2015, 9:51am

So you have not_analyzed fields but you don't use them for sorting or aggregations?

Yes you have to define this for every field but you can use Dynamic Templates.

Alexander_Ott · December 7, 2015, 3:31pm

Now i changed all our fields to doc_values: false. But it seems that it doesn't work... Index size with ES.1.4 = 69GB index size with 2.1 = 172GB. May you can have a look at our mapping? Is it possible to upload it here. Upload function only allowes jpg, jpeg, png, gif

dadoonet · December 7, 2015, 3:33pm

You can pretty format it and copy and paste here if small.

If not, paste it on gist.github.com

Alexander_Ott · December 7, 2015, 3:36pm

Please have a look at https://gist.github.com/anonymous/c0ab1a97d655322cde55

dadoonet · December 7, 2015, 3:52pm

Is it the same mapping you have for your 1.4 version?
The mapping looks good.

Did you index exactly the same data?

Alexander_Ott · December 8, 2015, 8:25am

The mapping is the same up to "doc_values: false" and under
"_source": {
"enabled": true,
"compressed": true
} i removed the "compressed": true cause it isn't supported anymore and compression is enabled by default. isn't it?

The data is exactly the same data and this is what makes me wonder. It seems that the doc_values: false configuration dosen't work....

dadoonet · December 8, 2015, 8:44am

@jpountz Any idea about this?

jpountz · December 10, 2015, 6:20pm

Alexandre, can you break down the size of your data directory by file extension for both versions? (eg. how much disk space are using the .fdt files, .dvm files, .tim files, etc.)

Alexander_Ott · December 14, 2015, 12:10pm

At https://gist.github.com/anonymous/120f63fbad5939febd92 you can find 4 files.

The first two files list the files and size inside the data directory of each es 1.4 node.

The last two files list the files and size inside the data directory of each es 2.1 node.

jpountz · December 14, 2015, 2:33pm

The main problem seems to be due to the fact that you have sparse analyzed string fields, ie. fields that are only present in a minority of documents. Norms were entirely stored in memory up to 2.0 included, which could occasionally take a lot of memory. In 2.1 norms have been moved to disk in order to reduce the memory requirements of Elasticsearch. However, while the new encoding requires much less memory (no memory at all actually), it also requires more disk space in the case that fields are sparse.

We can look into better compressing norms on sparse fields, but this is not something that would come for free, in particular performance would be affected, and this would take quite some time to be released.

However on your end, you could look into modeling your documents in such a way that you have fewer sparse analyzed string fields. Another option is to disable norms on your string fields if you don't need scoring.

Topic		Replies	Views
Elasticsearch disk usage 1.x vs 2.x Elasticsearch	3	636	July 5, 2017
DocValues in ES 1.0.0.Beta1 Elasticsearch	9	546	July 6, 2017
Upgrade from elastic 1.3.2 to 2.3.1 and more space for the indexes Elasticsearch	33	3820	July 5, 2017
Compresstion in ES 1.2.1 Elasticsearch	13	537	July 6, 2017
ES5 vs ES2 index size increase Elasticsearch	6	1060	April 28, 2017

Index Size: Elasticsearch 1.4 -> 2.1

Related topics