Hi folks,
In my last project, a backup of the indices had been quite unimportant. If gone it's gone, so I only did a snapshot of kibana index once a day.
For my new project I need to create a backup plan. Not loosing data is more important.
Let's say we have an retention time of accessible data in ES for 30 days for timestamp based indices. Additionally we have some entity-centric indices, which are not rotated.
As example requirement let's say I need a backup of the system and need to be able to restore the cluster state for somewhere in last 7 days. What is the right approach?
My understanding looks like the following, please give hints, correct me or point to pitfalls:
-
snapshot is defined to backup all indices
- is it best practice to snapshot all indices or to exclude the .monitoring-* indices?
-
as snapshots are defined to be lightwight lets say we run them once an hour
- is there a best practice for max frequency?
- how lightweight are they? Will it significantly reduce index or query performance when it is running?
-
automatically delete snapshots after 37 days (30 day retention time + 7 days restore window)
- documentation says: "When a snapshot is deleted from a repository, Elasticsearch deletes all files that are associated with the deleted snapshot and not used by any other snapshots" -> what will be happen if a snapshot contains entity-centric indices like kibana index? What will happen to the snapshot?
- will it be merged with following snapshots?
- will it be deleted or will it stay forever?
- if it stays end the snapshot contains data of entity centric index and by timestamp rotated indices, will the data of rotated indices be deleted inside the snapshot which is marked for deletion, do make it smaller?
- documentation says: "When a snapshot is deleted from a repository, Elasticsearch deletes all files that are associated with the deleted snapshot and not used by any other snapshots" -> what will be happen if a snapshot contains entity-centric indices like kibana index? What will happen to the snapshot?
If it's unclear what I am asking for, give me a comment, I will try to explain it more specific.
Additional question:
In relational databases like oracle we have the possibility to write archive and redo logs. So in case of restore you can restore without any data loss until the last transaction before the database crashed. Is there some equivalent in elasticsearch, or is the only possibility to run a small snapshot each few minutes to keep the timespan of data loss as small as possible? What are best practices here if a customer says data loss is inacceptable for me?
Thanks a lot, Andreas