I am trying to get some ratio about Elasticsearch cluster. I have this base :
**For 100 Gb of raw data / day :**
* 110 Gb of data are indexed (~ +10% for datatype)
* 220 Gb with 1 replica
* 6600 Gb per month
* +15% disk space to avoid saturation (7590 Gb)
<b>Disk space with Hot/Warm architecture:</b>
* Hot Data Node : Disk/RAM Ratio 30:1
* Warm Data Node : Disk/RAM Ratio 100:1
* 3 Masters Nodes with limited sizing
The fact is I think that I am wrong. That's a lot of Gb for a database. Does Elastic apply any compression ? Do u have some ratio to give me.
Hi,
Not answering the question, but don't forget to force merge indices (with flush enabled, in 1 segment) that are read-only to add some optimization in size (flush really delete deleted documents) and query time (less Lucene segment to look on)
For historical stuff, you can look into searchable snapshots which were added in 7.10. This allows you to have offload the replica to an S3 bucket, and in the event the primary shard goes down, Elasticsearch will automatically start a restore of the snapshot to an available node. This saves on having to keep replicas for older data. https://www.elastic.co/blog/introducing-elasticsearch-searchable-snapshots
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.