Cluster upgrade?


(J茅r么me Gagnon) #1

Hi everyone,

I was just wondering how you guys proceed to upgrade ElasticSearch versions
? I'm thinking about having 2 identical clusters and upgrading one at a
time to avoid downtime... since the protocol is binary you can't really
just stop 1 node at a time and restart it, i'm I right ? i just wanted to
know how to do it, I saw a method somewhere involving firewall and things
like that, but I don't want to go there yet. I want to have 0 downtime as
you may have guessed :slight_smile:

Greetings,

Jerome Gagnon

--


(ppearcy) #2

This really depends on how much data you're dealing with and if it can all
fit onto a single node.

To summarize my setup:

  • Have ~400GB of data
  • 3 production clusters, 4 nodes each (each nodes w/ 1.2TB of disk), each
    cluster in different DC

So, all my data can fit on a single node, which helps.

There are two methods we use. The easier way is to move traffic from one of
the DCs and restart the cluster. Not everyone has this option, but it is
really the easiest way to go. You can typically get things back into a
yellow state in under a minute, so the downtime is quite minimal.

The other way allows us to keep things up... but we still move traffic from
the DC during some steps because of performance degradation when shards
move around. Anyways this method is:

  • Remove a node from the cluster
  • Install new s/w there and ensure it isn't getting any search traffic
  • Rebuild our data to the new single node cluster. We rebuild from our
    primary datastore using some inhouse software that replicates any
    updates/adds/deletes to both.
  • Move nodes from old to new cluster
  • We'll often leave one node running the old cluster for a short period to
    allow us to roll back if necessary.

So, a lot of this is depends on how you're ingesting data into your cluster
and if all your data can fit on a subset of nodes.

Hope this helps,
Paul

On Friday, October 12, 2012 11:25:00 AM UTC-6, J茅r么me Gagnon wrote:

Hi everyone,

I was just wondering how you guys proceed to upgrade ElasticSearch
versions ? I'm thinking about having 2 identical clusters and upgrading one
at a time to avoid downtime... since the protocol is binary you can't
really just stop 1 node at a time and restart it, i'm I right ? i just
wanted to know how to do it, I saw a method somewhere involving firewall
and things like that, but I don't want to go there yet. I want to have 0
downtime as you may have guessed :slight_smile:

Greetings,

Jerome Gagnon

--


(Tanguy) #3

Hi Jerome,

Clinton wrote a nice article on how he proceed to upgrade an Es cluster:

https://groups.google.com/d/topic/elasticsearch/aB8GbaYIuqE/discussion

-- Tanguy
Twitter: @tlrx

Le vendredi 12 octobre 2012 19:25:00 UTC+2, J茅r么me Gagnon a 茅crit :

Hi everyone,

I was just wondering how you guys proceed to upgrade ElasticSearch
versions ? I'm thinking about having 2 identical clusters and upgrading one
at a time to avoid downtime... since the protocol is binary you can't
really just stop 1 node at a time and restart it, i'm I right ? i just
wanted to know how to do it, I saw a method somewhere involving firewall
and things like that, but I don't want to go there yet. I want to have 0
downtime as you may have guessed :slight_smile:

Greetings,

Jerome Gagnon

--


(J茅r么me Gagnon) #4

Yes, I saw that, so that was basically my point, since all of my data
cannot fit in one node, I will need 2 mirror clusters to complete the
process

On Monday, October 15, 2012 4:16:32 AM UTC-4, Tanguy wrote:

Hi Jerome,

Clinton wrote a nice article on how he proceed to upgrade an Es cluster:

https://groups.google.com/d/topic/elasticsearch/aB8GbaYIuqE/discussion
https://gist.github.com/3888120

-- Tanguy
Twitter: @tlrx
https://github.com/tlrx

Le vendredi 12 octobre 2012 19:25:00 UTC+2, J茅r么me Gagnon a 茅crit :

Hi everyone,

I was just wondering how you guys proceed to upgrade ElasticSearch
versions ? I'm thinking about having 2 identical clusters and upgrading one
at a time to avoid downtime... since the protocol is binary you can't
really just stop 1 node at a time and restart it, i'm I right ? i just
wanted to know how to do it, I saw a method somewhere involving firewall
and things like that, but I don't want to go there yet. I want to have 0
downtime as you may have guessed :slight_smile:

Greetings,

Jerome Gagnon

--


(system) #5