My ES Index is a week based index Ex: index-2017-04-01 (Index-Year-Month-Week). I am trying to create Backup Snapshots of that index. Our indices are updated every second. Our Backup API is called once a day. Is it possible for the Segments present in Day1's Backup not to be present in Day2's backup? Let me put an example for clear understanding of my requirement.
index-2017-04-01 has documents a,b,c on 1st
index-2017-04-01 has documents a,b,c,d,e,f on 2nd
My requirement is to have backups created as below.
Bckp 1 : a,b,c
Bckp 2 : d,e,f
so when i restore it is independent.
Any answers related to this is very much appreciated.
Hello,
good news! Snapshots do that by default
Whenever you make a snapshot Elasticsearch will only copy segments that have not already been snapshotted.
It will only re-copy existing data if a merge happened.
Here is the relevant part of the docs:
The index snapshot process is incremental. In the process of making the index snapshot Elasticsearch analyses the list of the index files that are already stored in the repository and copies only files that were created or changed since the last snapshot. That allows multiple snapshots to be preserved in the repository in a compact form
When you delete a snapshot we check if the segments are used by any other snapshot.
Firstly, the issue we have is that we do not persist the .dat files(Snapshots) in the "path.repo" configured value rather after generating the file, it is shipped to a remote server location. Is it possible to achieve the same in this scenario?
Secondly, When i back up in intervals where no document is added, i still see the size of the snapshots being the same, should it not have smaller size for the consequent backups?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.