ElasticSearch 1.0 Manual Backup


(The Dude) #1

Hi all,

I am in the position of wanting to upgrade a cluster that is 15TB in size. Prior to doing this I want to create a backup of my indices. I just want to make sure that I might not be missing anything critical in the following approach. Initial testing seems just fine and, while this may seem "hackish", I want a simple command line solution that will ensure that if a rolling upgrade fails, I can at least reconstruct the entire cluster confidently.

Background: Indices are created daily. Once a day has passed the index is no longer written to ever.

Backup:

  1. I call flush against the index (which is no longer being written to and never will be written to). I assume this makes sure that everything is properly written to disk.
  2. I ask for shard locations for the index via api calls, zip those locations, and push them to storage. In this case there are 5 shards per index and I only intend to copy the primary shards in order to save space.
  3. I go to the clusters active master and zip the folder that contains metadata about the index and push it to storage.

Restore:

  1. Unzip the metadata folder on the master node.
  2. Unzip the data on a data node.
  3. Restart the master(s).

Questions:

  1. Does this seem reasonable? Am I missing anything?
  2. Are there possible points of failure that I should be aware of?
  3. Do I have to restart the master(s) or is there another way to have the cluster become aware of newly unzipped indices?

I appreciate your time,
Nate


(Magnus B├Ąck) #2

Just double-checking here: you are aware of the snapshot/restore feature that would be way safer and quite possibly easier than what you're contemplating?


(The Dude) #3

I was under the impression that 1.0 only offers fs backup. I am working in AWS and would rather not have to deal with attaching volumes for backup (even given EFS). I take it s3 backup is also available in ES 1.0.0? In which case that is great news!

Edit: Looking at the documentation for vs 2.0.0RC1, which is what I am using, it looks like s3 is in fact supported. Thanks for your time!


(David Pilato) #4

AWS plugin exists since ages and will help you to do that.

You have also the same for Azure and HDFS.


(system) #5