Compression in Elasticsearch documents

I would like to know if Elasticsearch documents/indices are stored in
compressed format on disk . If yes, what type of compression options are
available and it's performance overheads.

and if these compression options are configurable.

Thanks
Ajay

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c63c25f-4f49-47f4-8d0a-772d3301f45c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

Data are both duplicated to suit different access patterns and compressed.
There are so many compression algorithms in-place that it would be hard to
be exhaustive, but we have for instance Frame-Of-Reference compression for
postings lists, LZ4 for the document store, bit packing for numeric doc
values, ...

There are no configurations options available to configure compression
besides disabling features that you don't need (such as norms on fields
that you don't score on). In the next major version of elasticsearch (2.0)
there will be a setting to enable heavier compression though (which in
practice will use DEFLATE instead of LZ4 for the document store):

On Tue, Apr 14, 2015 at 6:47 PM, ajay.bh111@gmail.com wrote:

I would like to know if Elasticsearch documents/indices are stored in
compressed format on disk . If yes, what type of compression options are
available and it's performance overheads.

and if these compression options are configurable.

Thanks
Ajay

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8c63c25f-4f49-47f4-8d0a-772d3301f45c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8c63c25f-4f49-47f4-8d0a-772d3301f45c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO5%3DkAiF5uRZmGCocKgjeiuBahpsc1iMZ-7XkQWFzWK3hVWPvg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Adrian

Thanks for quick response.

When I loaded nearly 45m documents of test data with 3 replicas (each
document approx 2K+ bytes in size), I got following info on storage:

health status index pri rep docs.count docs.deleted store.size
pri.store.size

green open test_insert 5 3 44985382 0
414.9gb 106.4gb

This indicates there was hardly any compression on physical storage*.*
Hence my question. How do I find /estimate how much storage would be used
for X number of documents of average size of Y kilobytes each. From above
result, it appears to be no compression at all on all stored data.

Thanks

Ajay

On Tuesday, April 14, 2015 at 1:13:37 PM UTC-4, Adrien Grand wrote:

Hi,

Data are both duplicated to suit different access patterns and compressed.
There are so many compression algorithms in-place that it would be hard to
be exhaustive, but we have for instance Frame-Of-Reference compression for
postings lists, LZ4 for the document store, bit packing for numeric doc
values, ...

There are no configurations options available to configure compression
besides disabling features that you don't need (such as norms on fields
that you don't score on). In the next major version of elasticsearch (2.0)
there will be a setting to enable heavier compression though (which in
practice will use DEFLATE instead of LZ4 for the document store):
Add `best_compression` option for indices by rmuir · Pull Request #8863 · elastic/elasticsearch · GitHub

On Tue, Apr 14, 2015 at 6:47 PM, <ajay....@gmail.com <javascript:>> wrote:

I would like to know if Elasticsearch documents/indices are stored in
compressed format on disk . If yes, what type of compression options are
available and it's performance overheads.

and if these compression options are configurable.

Thanks
Ajay

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8c63c25f-4f49-47f4-8d0a-772d3301f45c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8c63c25f-4f49-47f4-8d0a-772d3301f45c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/50726923-3199-457b-a53e-24978cb94510%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

On Tue, Apr 14, 2015 at 7:35 PM, ajay.bh111@gmail.com wrote:

Hi Adrian

Thanks for quick response.

When I loaded nearly 45m documents of test data with 3 replicas (each
document approx 2K+ bytes in size), I got following info on storage:

health status index pri rep docs.count docs.deleted store.size
pri.store.size

green open test_insert 5 3 44985382 0
414.9gb 106.4gb

This indicates there was hardly any compression on physical storage*.*
Hence my question. How do I find /estimate how much storage would be used
for X number of documents of average size of Y kilobytes each. From above
result, it appears to be no compression at all on all stored data.

Compression ratios depend so much on the data that you can't really know
what the compression ratio will be without indexing sample documents.
However, once you indexed enough documents (eg. 100k), you can expect the
store size to keep growing linearly with the number of documents.

Most of time the largest part of the index is the document store. In your
case I assume that LZ4 is too lightweight a compression algorithm to manage
to compress your data efficiently. The high compression option which is
coming in elasticsearch 2.0 might help.

--
Adrien

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO5%3DkAhehRg9QDNCK-TO%2BKYnX3T%2B5BH9QEM5nUi21u%2BgqBQEFg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

How much space the data takes up on disk in Elasticsearch depends a lot on
your mappings. In addition to storing the source in the _source field, all
fields are by default also copied over to the _all field to allow free text
search across all fields. In addition to this Elasticsearch also indexes
all the fields in the source document, sometimes in multiple ways, which
also takes up space. The amount of data Elasticsearch need to store can
therefore grow quite a bit before compression is applied.

You might be able to reduce the indexed size on disk by ensuring your
mappings are as efficient as possible, e.g. by disabling the _all field if
you do not need it.

Best regards,

Christian

On Tuesday, April 14, 2015 at 5:47:59 PM UTC+1, ajay....@gmail.com wrote:

I would like to know if Elasticsearch documents/indices are stored in
compressed format on disk . If yes, what type of compression options are
available and it's performance overheads.

and if these compression options are configurable.

Thanks
Ajay

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fbeafe91-6c20-4e4c-9eff-94d7aa40e381%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.