Hi There,
we are currently searching for the best way to backup our ES-Cluster. Maybe the pros could give us some advice.
Background:
- ES-Cluster (6.2.x) running on Azure VMs
- Repository target is a blob (using repository-azure)
- Two types of daily indices:
- Shortterm: We will delete them after 7 days
- Longterm: We will delete them after ~3 years (maybe moving them into a archive-cluster after x month if they have an impact on the current cluster)
- Main load will be from 12:00 to 22:00
- If the Cluster is down we are able to buffer all new documents around 2 days
- The Cluster is growing around 20 indeces each 500MB a day.
- There is currently no usecase where we have to write into an old index
- Also no usecase where we have to delete/update any document
- We want to have a snapshot every hour
Our current plan is:
- Calling a force merge on all yesterday indeces at night
- Creating a snapshot including all indeces every hour and delete snapshots older than ?3? days
Would this be a propper plan? Any advice? Should we add two repositorys for short/longterm?
The Question we got while thinking about the backups:
Is there a performance impact if we just snapshot all indeces? Lets say in one year we have 3650 longterm and 70 shortterm indeces but only 20 of them are actively indexed.
So there would be no need for the cluster to check 3700 indeces for the snapshot cause we already have the latest state in another snapshot.
Is the snapshot-api smart enough to know that lastChanged.Timestamp < lastSnapshot.Timestamp for the indeces? or it will compare all the old documents of this 3700 indeces for this snapshot?
Would it be better if we call the snapshot using {"indeces": "[list of all current day]"} and once per day a full snapshot into the same repository?
Best regards,
Andreas