Elastic 2.0 beta 'best_compresion' vs default 1.6 compression


(Ziv) #1

Hey,

I've downloaded the master branch from git and installed the beta in order to compare 'best_compression' VS the current platform's compression with our data set.

What I did was indexing 1840000 documents and then changed the config file and added index.codec: “best_compression”, restarted and used the optimise API.

Unfortunately I did not see much of a difference, and actually the index was bigger for the 'best_compression'.

My 1.6.0 runs on CentOS and the following is the information of the index size:
docs.count docs.deleted store.size pri.store.size
1840000 0 152mb 152mb

The 2.0.0 beta runs on OS X and the following is the index size:
docs.count docs.deleted store.size pri.store.size
1840000 0 156mb 156mb

My data set is a mix of random ints and strings of 80000 documents, which were duplicated 23 times.

What am I doing wrong?
Or what should I expect?


#2

don't try to compress random data.


(Ziv) #3

I don't know if you could consider it random when everything repeats a lot of times.

When I gzip the Index then i see 1:3 compression ratio. I thought this would similar using 'best_compression'.
I've tried another test by dumping the dir list of my machine, and indexing the fullpaths of all the files.
text file is 23mb, compressed 1.5mb, of 230k file paths, average size of path is 100 bytes.
When indexing this file, the index size is roughly 23mb as well.
I've tried removing the _all, _source and set index=not_analyzed but looks like it doesn't affect the size of the index.


(system) #4