On Wed, Jan 23, 2013 at 10:48 AM, DH email@example.com wrote:
As I understood, it makes no impact on the search itself. The inverted
indexes are not compressed, so, no decompression has to be made.
It's true that Lucene doesn't use a general-purpose compression
algorithm to compress the inverted index, but it tries to use a very
compact representation (based on delta-encoding and bit-packing
since Lucene 4.1 and variable-length encoding in older versions)
so it can be likened to compression. The good news is that the
inverted index in Lucene 4.1 is faster (see annotation AB on ) and
When the results are given, however, each hit has to be decompressed. (that
is, if I specifically asked for a size of 10, there will be only 10
decompression operations, even if several millions docs are matching the
This should be, for search operation, the only drawback, performance wise.
(as I understood, quite a minimal one).
Is that correct?
This is correct.
my indexes are composed of several millions of rather big docs (500+ fields,
15+ nested collections). For each indexation, the source will have to be
compressed, at a performance loss.
Thus, the bigger drawback to compression would be the indexation
Is that correct?
If the compression algorithm is lightweight (it's the case for both
LZF (used by Elasticsearch) and LZ4 (used by Lucene 4.1)), it won't
necessary be the indexing bottleneck, especially if your analysis
chain is costly. Moreover, given that it reduces the amount of I/O to
perform, it could make indexing faster on slow disks.
On Thu, Jan 31, 2013 at 11:49 AM, Jilles van Gurp
The impact of compression is in my view generally worth it. People typically
overestimate the amount of CPU it takes to compress/decompress and
underestimate the effect of cutting a large percentage of disk/network IO.
You have to benchmark of course but my experience with lucene and solr is
that things are fine as long as indices and other data structures fit in
memory. Especially on large indices, limiting disk IO to the bare minimum
can make a lot of difference. IO tends to be the limiting factor on index
size, not CPU. So less IO is a good thing.
I can't agree more, this is precisely what motivated me to make stored
fields compressed by default in Lucene 4.1!
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
For more options, visit https://groups.google.com/groups/opt_out.