ElasticSearch 1.0 Manual Backup

Hi all,

I am in the position of wanting to upgrade a cluster that is 15TB in size. Prior to doing this I want to create a backup of my indices. I just want to make sure that I might not be missing anything critical in the following approach. Initial testing seems just fine and, while this may seem "hackish", I want a simple command line solution that will ensure that if a rolling upgrade fails, I can at least reconstruct the entire cluster confidently.

Background: Indices are created daily. Once a day has passed the index is no longer written to ever.

Backup:

  1. I call flush against the index (which is no longer being written to and never will be written to). I assume this makes sure that everything is properly written to disk.
  2. I ask for shard locations for the index via api calls, zip those locations, and push them to storage. In this case there are 5 shards per index and I only intend to copy the primary shards in order to save space.
  3. I go to the clusters active master and zip the folder that contains metadata about the index and push it to storage.

Restore:

  1. Unzip the metadata folder on the master node.
  2. Unzip the data on a data node.
  3. Restart the master(s).

Questions:

  1. Does this seem reasonable? Am I missing anything?
  2. Are there possible points of failure that I should be aware of?
  3. Do I have to restart the master(s) or is there another way to have the cluster become aware of newly unzipped indices?

I appreciate your time,
Nate

Just double-checking here: you are aware of the snapshot/restore feature that would be way safer and quite possibly easier than what you're contemplating?

I was under the impression that 1.0 only offers fs backup. I am working in AWS and would rather not have to deal with attaching volumes for backup (even given EFS). I take it s3 backup is also available in ES 1.0.0? In which case that is great news!

Edit: Looking at the documentation for vs 2.0.0RC1, which is what I am using, it looks like s3 is in fact supported. Thanks for your time!

AWS plugin exists since ages and will help you to do that.

You have also the same for Azure and HDFS.