I want to understand the relationship between the current "data" and snapshots. I am about to start work on a fairly small Elasticsearch application. At 2am each day a simple _snapshot is performed to a repository with a file name parameter containing the current date/time. So each days snapshot is written to a new file. But is each of these files a complete copy or just incremental changes since yesterday?
Also if the drive holding the active data crashes at 2pm have we lost 12 hours of new data?
If the drive holding the snapshots glitches and one or more files holding snapshots is corrupted how much is lost.
Sorry if this is answered elsewhere, but can't find this info in the documentation
A snapshot isn't a single file, it's a whole collection of files. Each snapshot is a complete self-contained unit. See these docs:
Each snapshot is also logically independent. When you delete a snapshot, Elasticsearch only deletes the segments used exclusively by that snapshot. Elasticsearch doesn’t delete segments used by other snapshots in the repository.
Yes. 24h is a long time to wait. Typically people take snapshots every 30 minutes, sometimes even more frequently.
It depends how lucky you are, but potentially you could lose every snapshot. Take a backup of your repository to protect against this.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.