I have a query regarding archival of ElasticSearch data. We are taking Manual snapshots of our ElasticSearch data after every 6 hours and it is getting stored in storage like S3. In our ElasticSearch, say we have 3 indexes and each index has a different retention policy, like 7 days, 14 days and 30 days respectively.
Now, we want to query the data as old as 6 months at some point of time, but since we have the retention policies in place, so how would we be able to query the old data if we are storing indexes only for few days.
Cloud by elastic is one way to have access to all features, all managed by us. Think about what is there yet like Security, Monitoring, Reporting, SQL, Canvas, APM, Logs UI, Infra UI, SIEM, Maps UI, AppSearch and what is coming next ...
Since the snapshots are incremental, so for the index having retention policy of 7 days, the 8th day snapshot wouldn't contain the old data of 7 days but only the ones created on 8th day (if I'm not wrong). I am not sure how to handle the historical data of all the indexes then.
So, we are preferring to query the data as per the retention policy of each index only for now. If in case we plan to change this scenario, we will be sending the snapshots stored in S3 to S3 Glacier (and later restore it) as per Lifecycle management policy to prevent storage cost and keep the historical data for as long as we might want.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.