Hello. We have a hard limit of 4TB of disk storage that can be allocated to individual nodes in a cluster.
My question is what happens when we start nearing the threshold of 4TB? Is there a way we could archive data outside the cluster and which is available as well?
Elasticsearch has some thresholds that will trigger when the storage usage for each node reaches some of the defined percents usage.
Basically you have the low watermark, the high watermark and the flood stage watermark.
The low watermark by default is set at 85% of disk usage, when the node reaches this stage it will stop to receive new shards, but only affect replica shards, not primary shards.
The high watermark by default is set to 90% of disk usage, when the node reaches this stage Elasticsearch will try to allocate shards out of the node.
And the flood stage is at 95%, when the node reaches this stage all index that have a shard in this node will be set as read only.
About archiving data outside the cluster, you can use Snapshots for that, this is explained in this part of the documentation, but the data in snapshots are not searchable unless you have the Enterprise license.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.