Snapshot data based on query

sanju_yadav · September 22, 2019, 7:37am

I have a big index containing data of last 3 years, now I want to implement snapshot to solve and data mishap, I want to include only last 3 months data in snapshot and after that move to incremental snapshot, can I do this?

Christian_Dahlqvist · September 22, 2019, 8:02am

Snapshots work at the index level copying complete segments, so if all your data is in a single index you will need to snapshot all of it.

sanju_yadav · September 27, 2019, 7:06am

thanks

sanju_yadav · September 30, 2019, 5:37am

why elastic search does not provide snapshot based on the timestamp query like mysql get snapshot of data created before some date, I am not able to understand this. Please share your thoughts on the same

Christian_Dahlqvist · September 30, 2019, 5:45am

Elasticsearch often handle considerably larger data volumes than yuor typical relational databases like MySQL does. I have seen clusters with over a petabyte of data. In order to take backups at that scale the backup process must be efficient in terms of computation and disk I/O and retrieving documents based on a query is much, much more expensive (results in lots of random access disk reads due to how Lucene works) than copying the full segments/index files that Lucene creates, which basically is what the snapshot/restore mechanism does.

In your case you could reindex the last 3 months based on a query into a new index/set of time-based indices and then remove the current index. If you have a large data volume this is likely to take time and result in a lot of disk I/O.

sanju_yadav · October 1, 2019, 5:16am

thanks

system · October 29, 2019, 5:17am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.