We're currently utilizing Elasticsearch version 8.8.1 and storing our daily snapshots in S3 using the 'Intelligent_Tiering' storage class. In an effort to optimize costs, based on AWS's documentation on Intelligent Tiering says that " objects not accessed for 180 days are moved to the Deep Archive Access tier with up to 95% in storage cost savings."
However, it's important to note that this could impact data availability, as data won't be instantly accessible. Has anyone experimented with this approach? Are there any risks of data loss associated with it?
I understand that Glacier Deep is not supported, but the "Deep Archive tier" is essentially incorporated into "Intelligent Tiering," which is now natively supported by Elasticsearch snapshots.
I haven't tried it, but I'm pretty sure this means it won't work with Elasticsearch snapshots today. Elasticsearch needs the data to be instantly accessible.
If the object you are retrieving is stored in the optional Deep Archive tier, before you can retrieve the object you must first restore a copy using RestoreObject.
Elasticsearch definitely doesn't know how to do this.
However, if we manually initiate the RestoreObject process, the data will automatically transition to the Frequent Access Tier and become accessible again. Elasticsearch should then be able to detect it. right?
On the other side, I think one potential drawback to consider is that the retention of the backup may be affected and not function correctly, as the data won't be immediately accessible.
One suggestion, but I'm not sure this will work for you, is to use another cloud provider, in this case Google Cloud Storage.
We had similar requirements to save money with snapshots and decided to use GCP because how their storage tiering works.
When you move between the storage classes, it will not change the objects, it will change the pricing for the API requests related to the object, the data is still immediately accessible, but it costs more to list, read and retrieve.
In our case we have daily and monthly indices, so we have policy that changes the storage classes from Standard to Coldline after 31 days
The retention of the snapshots are not affected by this since DELETE operations on the objects are free in any storage classes after their required retention time.
GCS's autoclass feature is pretty similar to S3's Intelligent Tiering, it's just that S3 goes even colder than GCS does with their Deep-Archive tier. But that's an opt-in feature, it should work fine on S3 as long as you don't opt in.
(NB I haven't done a pricing comparison between the two, maybe GCS is cheaper, but maybe not)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.