Elastic Search Index Data Compression (v1.4.2)

sagarshah1983 · June 18, 2015, 6:30pm

Hello everyone,
I have been using Elastic Search for storing application logs.

Elastic Search version: 1.4.2
Log Retention Policy: 30 days

Number of logs generated per month: 250 million
Number of shards per index: 5
Number of replicas for index: 1

Logs/Documents in my index are not big, but number of documents are enormous.

Some Stats of existing data and extrapolation based on the same:
0.7 million - 260 MB
250 Million - 92 GB

92 GB of data per site for just application logs sounds too much to me.
So I am keen to know if these indexes or logs data under can be compressed? If so, what performance impact can it make?
My writes to elastic search will be more frequent and concurrent, while search requests will not be much frequent.

Please advise.

Appreciate.

Regards,
Sagar Shah

Christian_Dahlqvist · June 18, 2015, 6:53pm

The size your log data takes up on disk depends a lot on the type of data you have and how you are mapping and indexing it. The logstash default settings indexes all text fields as both analysed and not_analysed, which gives a lot of flexibility, but can take up a lot of disk space. We published a blog post a while back that looked at how different mappings affect the size indexed data takes up on disk for a few sample data types. This shows that even though Elasticsearch already does apply compression when it indexes data, the size of indexed data on disk can still be larger than the raw data depending on what mappings that are used.

sagarshah1983 · June 19, 2015, 1:27am

Thank you Christian!
That helps

sagarshah1983 · June 19, 2015, 2:27am

And if I understand it correctly, Default configuration of Elastic Search provides compression support. Is that correct?

Thanks again!

Christian_Dahlqvist · June 19, 2015, 8:11am

Elasticsearch already compresses data internally by default. The current algorithm balances speed and compression, but the ability to specify more efficient compression is coming in version 2.0.

sagarshah1983 · June 19, 2015, 1:17pm

Thank you Christian. Keen to use Elastic Search v 2.0

sagarshah1983 · June 19, 2015, 1:58pm

There's one more finding in this.
I had one index on one of our QA box (a month old index) which has around 0.7 million records taking about 260 MB.
I created a new index on same server with same mapping and pulled all records one by one from existing index and pushed into this new index.
Surprisingly, I see the space taken by new index (a day old) with same mapping and setting as 136 MB only.
What could make such a big difference here?

Please clarify.

Appreciate!

Christian_Dahlqvist · June 19, 2015, 3:22pm

Do you have the same number of shards for both indices? Do you have any deleted documents, e.g. due to updates in the older index?

When comparing size of indices, I generally optimise them first to ensure they are as compact as possible. Can you try optimising the indices and see if the size difference remain?

sagarshah1983 · June 19, 2015, 4:28pm

Thanks Christian for your reply.
Both have same number of shards (5).
But yes, there were some documents deleted from original index at some point, which was done with the help of _ttl field.

How can I optimize those index?

Please advise.

Appreciate!

sagarshah1983 · June 19, 2015, 4:38pm

I found this article
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-optimize.html

I can optimize this and compare the stats thereafter.
Is it an expensive process? Does it take long time for big index?
Does it block all my incoming index requests while it's optimizing?

Harlin_ES · June 19, 2015, 4:48pm

Optimizing an index is extremely expensive, especially if it is optimized all the way down to 1 segment per shard. Also, never optimize an index that is still getting index requests, it will cause all kinds of problems. I would only ever call optimize on an index if it is not being indexed into any longer and your cluster is also not doing too much else.

Topic		Replies	Views
Elasticsearch compresson Elasticsearch	2	427	May 26, 2017
ElasticSearch index size peculiarity Elasticsearch	2	686	July 6, 2017
Index setting review Elasticsearch	1	330	July 6, 2017
Compression in ElasticSearch Elasticsearch	6	2373	July 5, 2017
Elasticsearch Index size 2.2X times bigger than my log file Elasticsearch	3	614	April 17, 2018

Elastic Search Index Data Compression (v1.4.2)

Related topics