Correct Sizing and compression on Elasticsearch


I am trying to get some ratio about Elasticsearch cluster. I have this base :

**For 100 Gb of raw data / day  :**

* 110 Gb of data are indexed (~ +10% for datatype)
* 220 Gb with 1 replica 
* 6600 Gb per month
* +15% disk space to avoid saturation (7590 Gb)

<b>Disk space with Hot/Warm architecture:</b>

* Hot Data Node : Disk/RAM Ratio 30:1
* Warm Data Node : Disk/RAM Ratio 100:1
* 3 Masters Nodes with limited sizing

The fact is I think that I am wrong. That's a lot of Gb for a database. Does Elastic apply any compression ? Do u have some ratio to give me.


Not answering the question, but don't forget to force merge indices (with flush enabled, in 1 segment) that are read-only to add some optimization in size (flush really delete deleted documents) and query time (less Lucene segment to look on) :slight_smile:

Hi, thank you for your answer, I will notice it :slight_smile:

A few things that you also might want to look into:

  1. Index setting: codec, you can set to best_compression which compresses the _source more, saving additional space.
  2. For historical stuff, you can look into searchable snapshots which were added in 7.10. This allows you to have offload the replica to an S3 bucket, and in the event the primary shard goes down, Elasticsearch will automatically start a restore of the snapshot to an available node. This saves on having to keep replicas for older data.

I would recommend reading this section in the documentation.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.