I am implementing a backup solution to our elasticsearch cluster.
All the indices on the whole cluster (10 data nodes, 3 master nodes) is stored on the same NetApp FAS8000 network share.
NetApp has incredibly fast snapshotting. It is desirable to use it if it works as intended.
Do you think we can rely on snapshotting with NetApp, or should we use Elasticsearchs own snapshot API?
Will elasticsearch be able to restore the backup from NetApps file snapshot?
You could, but you would have to restore the entire cluster to the point you want, you won't be able to restore just a node or an index, as there is a bunch of things that happen when you take a snapshot in ES to tie all the shards together at that time.
I would still prefer to use the Elasticsearch snapshot and restore feature. When you request a snapshot it will prevent all existing segments from being deleted until the snapshot is complete which prevents the snapshot being corrupted because a file it requires was deleted mid-copy (e.g. because it has been merged into a new segment and is no longer required). No matter how fast the snapshotting of an external system, I would think that there will always be a window of time (however small) in which a file is deleted mid-snapshot.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.