I am in the position of wanting to upgrade a cluster that is 15TB in size. Prior to doing this I want to create a backup of my indices. I just want to make sure that I might not be missing anything critical in the following approach. Initial testing seems just fine and, while this may seem "hackish", I want a simple command line solution that will ensure that if a rolling upgrade fails, I can at least reconstruct the entire cluster confidently.
Background: Indices are created daily. Once a day has passed the index is no longer written to ever.
- I call flush against the index (which is no longer being written to and never will be written to). I assume this makes sure that everything is properly written to disk.
- I ask for shard locations for the index via api calls, zip those locations, and push them to storage. In this case there are 5 shards per index and I only intend to copy the primary shards in order to save space.
- I go to the clusters active master and zip the folder that contains metadata about the index and push it to storage.
- Unzip the metadata folder on the master node.
- Unzip the data on a data node.
- Restart the master(s).
- Does this seem reasonable? Am I missing anything?
- Are there possible points of failure that I should be aware of?
- Do I have to restart the master(s) or is there another way to have the cluster become aware of newly unzipped indices?
I appreciate your time,