Store Compression

I have my ES cluster stored on a Pure Storage array. The Pure array has inline dedup and compression and performs best when the data on it is not compressed or encrypted. The ES compression feature is causing issues with the Pure's ability to dedup. Is there any way to disable ES compression? We are using ES 1.7.1

No, there is no way to disable Lucene compression at the moment. You can only choose the type and level of compression (LZ4 vs DEFLATE).

I suspect compression would not be much better with Pure anyway. Elasticsearch / Lucene uses a variety of targeted compression techniques, because the nature of the data is known upfront which allows extra optimization. These targeted compression methods are usually superior to more general purpose algorithms simply because the algorithm knows what to expect.

For example, the posting lists in Lucene are already very minimal due to the nature of the data structure (e.g. a token-to-docID mapping naturally de-duplicates the data). The posting list is then stored as a sorted list of packed integers. Term dictionaries are sorted, prefix-encoded strings with an associated packed ordinal map.

If you use Doc Values, there are a number of tricks used to encode the data compactly. Table encoding if the cardinality is low, GCD encoding if there is a common divisor, offset encoding if there is no divisor, etc.

The only real savings I think would be the original source JSON. Lucene compresses the source in chunks using LZ4 or DEFLATE, which provides good tradeoff in speed and compression. It's unclear if Pure would be better, since the smallest dedupe chunk size in Pure is 4KB, so only very large documents could be deduped. Since Pure's post-dedupe compression algo is LZO, it would have relatively similar compression profile to LZ4 (and generally worse than DEFLATE).

But that's just conjecture on my part. You could open a ticket on the Github project asking for an option to disable compression, but I suspect it will receive the same answer as this older ticket.

I have the same issue with the same Pure array. Where you ever able to find a solution to this?

1 Like