I've got a 48-node ES cluster running 0.20.5 that I'd like to upgrade to
0.90.3 with as little downtime as possible. I realise that this is a
tall order, as the release notes (and past experience) make it clear
that mixing versions is a Bad Idea, thus I can't simply roll through the
nodes one by one.
Any hints ?
--
dan (phrawzty).
mozilla webops; european outpost.
No hints other then saying 0.90.3 was the first install/full restart I've
done on our 30 node cluster that didn't end up with split brain! Woot!
Usually one or two nodes don't join the cluster. So was able to shutdown,
restart, and go yellow in under two minutes. Probably could have been
faster if I remove some of the pauses I have that I added to "help" with
previous full restart issues.
If you are able to fit all your data on a subset of nodes and you're able
to keep two clusters in sync and have smarts to know which cluster to route
searches to (Eeesh, lots of ifs), you can run two clusters and switch nodes
from old to new one by one. This also gives you the capability to revert to
the old version if you run into issues with the new.
Lots of smarts needed on your side, likely quickest/easiest is to shut
everything down, upgrade every node and start it back up.
Best Regards,
Paul
On Wednesday, August 7, 2013 12:29:32 PM UTC-6, Andy Wick wrote:
No hints other then saying 0.90.3 was the first install/full restart I've
done on our 30 node cluster that didn't end up with split brain! Woot!
Usually one or two nodes don't join the cluster. So was able to shutdown,
restart, and go yellow in under two minutes. Probably could have been
faster if I remove some of the pauses I have that I added to "help" with
previous full restart issues.
We made the upgrade for a 100+ nodes cluster with a ~3 minutes downtime,
wasn't that bad, you just have to be prepared and have the good tools.
On Wednesday, August 7, 2013 3:56:00 AM UTC-4, Daniel Maher wrote:
Hello,
I've got a 48-node ES cluster running 0.20.5 that I'd like to upgrade to
0.90.3 with as little downtime as possible. I realise that this is a
tall order, as the release notes (and past experience) make it clear
that mixing versions is a Bad Idea, thus I can't simply roll through the
nodes one by one.
Any hints ?
--
dan (phrawzty).
mozilla webops; european outpost.
Tools can be as simple as parallel-ssh and (some) bash scripts.. that is
error-prone and kind of sketchy, but this is one of the simplest possible
solution..
You should probably more safely use chef, puppet or any other automation
framework for more robustness and flexibility.
Jerome
On Thursday, August 8, 2013 10:00:04 AM UTC-4, Nikolas Everett wrote:
On Thu, Aug 8, 2013 at 9:49 AM, J茅r么me Gagnon <jerome....@gmail.com<javascript:>
wrote:
We made the upgrade for a 100+ nodes cluster with a ~3 minutes downtime,
wasn't that bad, you just have to be prepared and have the good tools.
It'd be really useful if you could explain some of your good tools!
Hey! If I may chime in, you probably want to look into Ansible which offers
very efficient and simple automation facilities which other provisioning tools like Chef & Puppet don't really have. I am not
affiliated with Ansible, I just recently had a "ah-ah!" moment with it for
exactly this kind of context.
Have fun,
Colin
On Thursday, August 8, 2013 10:17:59 AM UTC-4, J茅r么me Gagnon wrote:
Tools can be as simple as parallel-ssh and (some) bash scripts.. that is
error-prone and kind of sketchy, but this is one of the simplest possible
solution..
You should probably more safely use chef, puppet or any other automation
framework for more robustness and flexibility.
Jerome
On Thursday, August 8, 2013 10:00:04 AM UTC-4, Nikolas Everett wrote:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.