AWS ElasticSearch - Manual Snapshot Archival

Hi Everyone,

I have a query regarding archival of ElasticSearch data. We are taking Manual snapshots of our ElasticSearch data after every 6 hours and it is getting stored in storage like S3. In our ElasticSearch, say we have 3 indexes and each index has a different retention policy, like 7 days, 14 days and 30 days respectively.

Now, we want to query the data as old as 6 months at some point of time, but since we have the retention policies in place, so how would we be able to query the old data if we are storing indexes only for few days.

Any inputs would be appreciated!

Thanks,
Mehak

I think you'd need to restore the snapshots.

In the future, searching in snapshots will be hopefully available. I'm not sure it will be possible to have that with the service you're using though.

BTW did you look at https://www.elastic.co/cloud and https://aws.amazon.com/marketplace/pp/B01N6YCISK ?

Cloud by elastic is one way to have access to all features, all managed by us. Think about what is there yet like Security, Monitoring, Reporting, SQL, Canvas, APM, Logs UI, Infra UI, SIEM, Maps UI, AppSearch and what is coming next :slight_smile: ...

@dadoonet, thanks for your inputs.

Since the snapshots are incremental, so for the index having retention policy of 7 days, the 8th day snapshot wouldn't contain the old data of 7 days but only the ones created on 8th day (if I'm not wrong). I am not sure how to handle the historical data of all the indexes then.

Don't consider snapshots as incremental backups. They are actually full backups even though only changes between the last run are actually backed up.

If you want to keep snapshots with indices you have deleted you may need to create time-based repositories instead of using a single cluster wide one.

Yes, right. Thank you.

@Christian_Dahlqvist , Alright, I will have a look at this approach. Thank you for your inputs.

So, we are preferring to query the data as per the retention policy of each index only for now. If in case we plan to change this scenario, we will be sending the snapshots stored in S3 to S3 Glacier (and later restore it) as per Lifecycle management policy to prevent storage cost and keep the historical data for as long as we might want.

Thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.