Upgrade a 48-node cluster with minimal downtime?

Daniel_Maher_3 · August 7, 2013, 7:56am

Hello,

I've got a 48-node ES cluster running 0.20.5 that I'd like to upgrade to
0.90.3 with as little downtime as possible. I realise that this is a
tall order, as the release notes (and past experience) make it clear
that mixing versions is a Bad Idea, thus I can't simply roll through the
nodes one by one.

Any hints ?

--
dan (phrawzty).
mozilla webops; european outpost.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andy_Wick · August 7, 2013, 6:29pm

No hints other then saying 0.90.3 was the first install/full restart I've
done on our 30 node cluster that didn't end up with split brain! Woot!
Usually one or two nodes don't join the cluster. So was able to shutdown,
restart, and go yellow in under two minutes. Probably could have been
faster if I remove some of the pauses I have that I added to "help" with
previous full restart issues.

Andy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ppearcy · August 7, 2013, 9:47pm

If you are able to fit all your data on a subset of nodes and you're able
to keep two clusters in sync and have smarts to know which cluster to route
searches to (Eeesh, lots of ifs), you can run two clusters and switch nodes
from old to new one by one. This also gives you the capability to revert to
the old version if you run into issues with the new.

Lots of smarts needed on your side, likely quickest/easiest is to shut
everything down, upgrade every node and start it back up.

Best Regards,
Paul

On Wednesday, August 7, 2013 12:29:32 PM UTC-6, Andy Wick wrote:

No hints other then saying 0.90.3 was the first install/full restart I've
done on our 30 node cluster that didn't end up with split brain! Woot!
Usually one or two nodes don't join the cluster. So was able to shutdown,
restart, and go yellow in under two minutes. Probably could have been
faster if I remove some of the pauses I have that I added to "help" with
previous full restart issues.

Andy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jerome_Gagnon · August 8, 2013, 1:49pm

We made the upgrade for a 100+ nodes cluster with a ~3 minutes downtime,
wasn't that bad, you just have to be prepared and have the good tools.

On Wednesday, August 7, 2013 3:56:00 AM UTC-4, Daniel Maher wrote:

Hello,

I've got a 48-node ES cluster running 0.20.5 that I'd like to upgrade to
0.90.3 with as little downtime as possible. I realise that this is a
tall order, as the release notes (and past experience) make it clear
that mixing versions is a Bad Idea, thus I can't simply roll through the
nodes one by one.

Any hints ?

--
dan (phrawzty).
mozilla webops; european outpost.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

nik9000 · August 8, 2013, 2:00pm

On Thu, Aug 8, 2013 at 9:49 AM, Jérôme Gagnon jerome.gagnon.1@gmail.comwrote:

We made the upgrade for a 100+ nodes cluster with a ~3 minutes downtime,
wasn't that bad, you just have to be prepared and have the good tools.

It'd be really useful if you could explain some of your good tools!

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jerome_Gagnon · August 8, 2013, 2:17pm

Tools can be as simple as parallel-ssh and (some) bash scripts.. that is
error-prone and kind of sketchy, but this is one of the simplest possible
solution..

You should probably more safely use chef, puppet or any other automation
framework for more robustness and flexibility.

Jerome

On Thursday, August 8, 2013 10:00:04 AM UTC-4, Nikolas Everett wrote:

On Thu, Aug 8, 2013 at 9:49 AM, Jérôme Gagnon <jerome....@gmail.com<javascript:>

wrote:

We made the upgrade for a 100+ nodes cluster with a ~3 minutes downtime,
wasn't that bad, you just have to be prepared and have the good tools.

It'd be really useful if you could explain some of your good tools!

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

colinsurprenant · February 14, 2014, 8:05pm

Hey! If I may chime in, you probably want to look into Ansible which offers
very efficient and simple automation facilities which other
provisioning tools like Chef & Puppet don't really have. I am not
affiliated with Ansible, I just recently had a "ah-ah!" moment with it for
exactly this kind of context.

Have fun,
Colin

On Thursday, August 8, 2013 10:17:59 AM UTC-4, Jérôme Gagnon wrote:

Tools can be as simple as parallel-ssh and (some) bash scripts.. that is
error-prone and kind of sketchy, but this is one of the simplest possible
solution..

You should probably more safely use chef, puppet or any other automation
framework for more robustness and flexibility.

Jerome

On Thursday, August 8, 2013 10:00:04 AM UTC-4, Nikolas Everett wrote:

On Thu, Aug 8, 2013 at 9:49 AM, Jérôme Gagnon jerome....@gmail.comwrote:

We made the upgrade for a 100+ nodes cluster with a ~3 minutes downtime,
wasn't that bad, you just have to be prepared and have the good tools.

It'd be really useful if you could explain some of your good tools!

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/33715422-fe0c-42d3-b692-d2ed13acbb8c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Upgrade from 90.3 to 90.5 Elasticsearch	6	330	July 6, 2017
Upgrading cluster from 0.90.0 Elasticsearch	2	351	July 6, 2017
Elasticsearch upgrade without service interruption? Elasticsearch	2	290	July 6, 2017
Upgrade from 0.20.2 to 0.90.3 Elasticsearch	2	347	July 6, 2017
Upgrading from very old version of ES with zero down time Elasticsearch	5	427	July 6, 2017

Upgrade a 48-node cluster with minimal downtime?

Related topics