I'm looking to snapshot indexes on my local windows system(single node).
I am capable of creating snapshots based on indexes and restore them after deleting.
Although, this is done manually using Snapshot and Restore APIs, I wish to back up my indexes every day.
I see, curator was developed to achieve this. I installed Curator using Windows MSI Installer. I'm not sure as to where to use the Command Line Interface and locate my configuration file and action file.
Is Curator the right tool to achieve automatic snapshots or not? If yes, How do I configure its settings and and run it successfully? If not, What are the methods that I should be using?
You will need to use a scheduler to call the curator binary periodically. For example, every day at 3am, you might have the scheduler run your snapshot by launching "C:\Program Files\elasticsearch-curator\curator.exe --config C:\config.yml C:\action.yml" (configure config.yml and action.yml accordingly).
I have an add on to my previous doubt
Using a scheduler, if i create a snapshot of a single index every day at 3am for 8 consecutive days,
Day 1 would be first back up of the Index.
Day 2 to Day 7 would be incremental backups
On Day 8, If I wish to merge Day 1 and Day 2 snapshot into single snapshot (lets say with same name as snapshot created on Day 2 ) and delete Day 1 snapshot and hold maximum of 7 snapshots at a single time. Is this a possible scenario?
Can this scenario be achieved by just deleting Day 1 snapshot and expect Day 2 snapshot hold segments of Day 1 snapshot as they are of same index?
Yes, snapshots are incremental, but they are incremental at the Elasticsearch segment level, not at the data level.
If I write 10 documents to Elasticsearch, and they are flushed into a new index as a single segment, then take a snapshot, I will have written that single segment to the snapshot repository.
Now imagine that we do this 100 times. It would be easy to assume that each snapshot will only contain the new segment. But behind the scenes, Elasticsearch will have merged several of the segments during the course of the new indexing operations. Instead of 100 segments, I will likely have 20, and some of the segments will have 20, 40, 50, or even up to 100 documents. Any new segment—whether created by newly indexed documents or by merging existing segments—will be treated as a new segment to the snapshot API. As such, while the snapshots will be incremental, any segment that does not exist in the snapshot repository will be copied over, even if it contains documents that already exist in a different segment which was previously copied to the repository.
A snapshot contains a list of pointers to segments which were present in the index or indices snapshotted. If a segment was already present, it will point to the pre-existing segment. This segment will not be re-copied from the cluster as a result. And, when it comes time to delete snapshots, if a newer snapshot has a reference pointer to a segment in a snapshot scheduled for deletion, that segment will not be deleted, as it is required still. No segments will be deleted from the repository so long as a single snapshot has a reference to it.
There is no concept of "merging" snapshots. A snapshot points to segments. If a segment is not already in the repository, it will be copied across as part of the snapshot process.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.