Elastic 2.0 beta 'best_compresion' vs default 1.6 compression

ziv2081 · July 22, 2015, 3:08pm

Hey,

I've downloaded the master branch from git and installed the beta in order to compare 'best_compression' VS the current platform's compression with our data set.

What I did was indexing 1840000 documents and then changed the config file and added index.codec: “best_compression”, restarted and used the optimise API.

Unfortunately I did not see much of a difference, and actually the index was bigger for the 'best_compression'.

My 1.6.0 runs on CentOS and the following is the information of the index size:
docs.count docs.deleted store.size pri.store.size
1840000 0 152mb 152mb

The 2.0.0 beta runs on OS X and the following is the index size:
docs.count docs.deleted store.size pri.store.size
1840000 0 156mb 156mb

My data set is a mix of random ints and strings of 80000 documents, which were duplicated 23 times.

What am I doing wrong?
Or what should I expect?

rmuir · July 22, 2015, 10:48pm

don't try to compress random data.

ziv2081 · July 23, 2015, 9:04am

I don't know if you could consider it random when everything repeats a lot of times.

When I gzip the Index then i see 1:3 compression ratio. I thought this would similar using 'best_compression'.
I've tried another test by dumping the dir list of my machine, and indexing the fullpaths of all the files.
text file is 23mb, compressed 1.5mb, of 230k file paths, average size of path is 100 bytes.
When indexing this file, the index size is roughly 23mb as well.
I've tried removing the _all, _source and set index=not_analyzed but looks like it doesn't affect the size of the index.

Topic		Replies	Views
"best_compression" not compressing the data Elasticsearch	3	899	December 5, 2022
Changing index.codec on existing index? Elasticsearch	5	2793	July 5, 2017
Does Elasticsearch automatically compress indexes? Elasticsearch	6	538	September 1, 2020
Is there any drawback of using best_compression while indexing in Elasticsearch? Elasticsearch	2	5843	December 30, 2016
No index compression with "best_compression" in 6.3.2 Elasticsearch	7	960	October 18, 2018

Elastic 2.0 beta 'best_compresion' vs default 1.6 compression

Related Topics