Best practices for automated incremental backups

Hi,

I've been looking into the backup and restore plan for our modest elasticsearch cluster. We are currently setup in a "Hot/Warm" architecture with curator moving old indexs from hot to warm. We currently have 2x Hot and 2x Warm (We will be expanding this as soon as budget allows).

We use ZFS send/receive for most of our other services to perform snapshot and incremental backup (https://docs.oracle.com/cd/E23824_01/html/821-1448/gbciq.html). We then ship the snapshots off to a remote backup server using ZFS send/receive (https://docs.oracle.com/cd/E23824_01/html/821-1448/gbchx.html#scrolltoc). Services like like MySQL, Postgresql etc all work fine using this method.

After running a successful ZFS snapshot every hour and rolling back to a recent snapshot I'm unable to see any of the indexes, despite the data being back on the filesystem.

df -h
Filesystem      Size  Used Avail Use% Mounted on
elasticsearch   6.4T  128K  6.4T   1% /data/elasticsearch

zfs rollback elasticsearch@2018-03-19_08:00:01

df -h
Filesystem      Size  Used Avail Use% Mounted on
elasticsearch   6.4T   47M  6.4T   1% /data/elasticsearch

Has anyone ever been able to successfully backup and restore ES using ZFS snapshots? Is there a better way to achieve an automated, incremental backup system within ES? We have no access to a HDFS FS however we could possibly look at AWS S3 (This would be highly dependent on our data storage terms with our partners)

Thanks!

You should not use filesystem backups to back up Elasticsearch indices. The appropriate way to ensure that both your data and your cluster state information are in sync is to use the Snapshot (and Restore) API. Performing a filesystem snapshot is likely to result in slight deviations between what the cluster state thinks is in the segment data, and what is actually in the segment data. This happens because a filesystem snapshot could capture the cluster metadata at one point, and the segments could be different even a split second later. This will almost certainly result in corrupted index data at some point.

Snapshots in Elasticsearch are at the segment level, rather than at the data level. An Elasticsearch Snapshot is an exact moment-in-time copy of the indices and their underlying segments. Incremental snapshots means that any new segments will be copied to the repository when performing subsequent snapshots of the same index. Because segment merges happen continuously while new documents are added and removed, this means that data (documents) might be re-copied, since they may have been merged into a new segment since the last snapshot.

The only way to get automated, incremental backups is to use the Snapshot API on a scheduled basis. Since you're already using Curator, you can use NFS, S3, Azure, Google Cloud, or HDFS as a snapshot repository, and have Curator take your periodic snapshots using the Snapshot API.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.