Upgrade a 48-node cluster with minimal downtime?


(Daniel Maher-3) #1

Hello,

I've got a 48-node ES cluster running 0.20.5 that I'd like to upgrade to
0.90.3 with as little downtime as possible. I realise that this is a
tall order, as the release notes (and past experience) make it clear
that mixing versions is a Bad Idea, thus I can't simply roll through the
nodes one by one.

Any hints ? :slight_smile:

--
dan (phrawzty).
mozilla webops; european outpost.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Andy Wick) #2

No hints other then saying 0.90.3 was the first install/full restart I've
done on our 30 node cluster that didn't end up with split brain! Woot!
Usually one or two nodes don't join the cluster. So was able to shutdown,
restart, and go yellow in under two minutes. Probably could have been
faster if I remove some of the pauses I have that I added to "help" with
previous full restart issues.

Andy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(ppearcy) #3

If you are able to fit all your data on a subset of nodes and you're able
to keep two clusters in sync and have smarts to know which cluster to route
searches to (Eeesh, lots of ifs), you can run two clusters and switch nodes
from old to new one by one. This also gives you the capability to revert to
the old version if you run into issues with the new.

Lots of smarts needed on your side, likely quickest/easiest is to shut
everything down, upgrade every node and start it back up.

Best Regards,
Paul

On Wednesday, August 7, 2013 12:29:32 PM UTC-6, Andy Wick wrote:

No hints other then saying 0.90.3 was the first install/full restart I've
done on our 30 node cluster that didn't end up with split brain! Woot!
Usually one or two nodes don't join the cluster. So was able to shutdown,
restart, and go yellow in under two minutes. Probably could have been
faster if I remove some of the pauses I have that I added to "help" with
previous full restart issues.

Andy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(J茅r么me Gagnon) #4

We made the upgrade for a 100+ nodes cluster with a ~3 minutes downtime,
wasn't that bad, you just have to be prepared and have the good tools.

On Wednesday, August 7, 2013 3:56:00 AM UTC-4, Daniel Maher wrote:

Hello,

I've got a 48-node ES cluster running 0.20.5 that I'd like to upgrade to
0.90.3 with as little downtime as possible. I realise that this is a
tall order, as the release notes (and past experience) make it clear
that mixing versions is a Bad Idea, thus I can't simply roll through the
nodes one by one.

Any hints ? :slight_smile:

--
dan (phrawzty).
mozilla webops; european outpost.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nik Everett) #5

On Thu, Aug 8, 2013 at 9:49 AM, J茅r么me Gagnon jerome.gagnon.1@gmail.comwrote:

We made the upgrade for a 100+ nodes cluster with a ~3 minutes downtime,
wasn't that bad, you just have to be prepared and have the good tools.

It'd be really useful if you could explain some of your good tools!

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(J茅r么me Gagnon) #6

Tools can be as simple as parallel-ssh and (some) bash scripts.. that is
error-prone and kind of sketchy, but this is one of the simplest possible
solution..

You should probably more safely use chef, puppet or any other automation
framework for more robustness and flexibility.

Jerome

On Thursday, August 8, 2013 10:00:04 AM UTC-4, Nikolas Everett wrote:

On Thu, Aug 8, 2013 at 9:49 AM, J茅r么me Gagnon <jerome....@gmail.com<javascript:>

wrote:

We made the upgrade for a 100+ nodes cluster with a ~3 minutes downtime,
wasn't that bad, you just have to be prepared and have the good tools.

It'd be really useful if you could explain some of your good tools!

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Colin Surprenant) #7

Hey! If I may chime in, you probably want to look into Ansible which offers
very efficient and simple automation facilities which other
provisioning tools like Chef & Puppet don't really have. I am not
affiliated with Ansible, I just recently had a "ah-ah!" moment with it for
exactly this kind of context.

Have fun,
Colin

On Thursday, August 8, 2013 10:17:59 AM UTC-4, J茅r么me Gagnon wrote:

Tools can be as simple as parallel-ssh and (some) bash scripts.. that is
error-prone and kind of sketchy, but this is one of the simplest possible
solution..

You should probably more safely use chef, puppet or any other automation
framework for more robustness and flexibility.

Jerome

On Thursday, August 8, 2013 10:00:04 AM UTC-4, Nikolas Everett wrote:

On Thu, Aug 8, 2013 at 9:49 AM, J茅r么me Gagnon jerome....@gmail.comwrote:

We made the upgrade for a 100+ nodes cluster with a ~3 minutes downtime,
wasn't that bad, you just have to be prepared and have the good tools.

It'd be really useful if you could explain some of your good tools!

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/33715422-fe0c-42d3-b692-d2ed13acbb8c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #8