This really depends on how much data you're dealing with and if it can all
fit onto a single node.
To summarize my setup:
- Have ~400GB of data
- 3 production clusters, 4 nodes each (each nodes w/ 1.2TB of disk), each
cluster in different DC
So, all my data can fit on a single node, which helps.
There are two methods we use. The easier way is to move traffic from one of
the DCs and restart the cluster. Not everyone has this option, but it is
really the easiest way to go. You can typically get things back into a
yellow state in under a minute, so the downtime is quite minimal.
The other way allows us to keep things up... but we still move traffic from
the DC during some steps because of performance degradation when shards
move around. Anyways this method is:
- Remove a node from the cluster
- Install new s/w there and ensure it isn't getting any search traffic
- Rebuild our data to the new single node cluster. We rebuild from our
primary datastore using some inhouse software that replicates any
updates/adds/deletes to both.
- Move nodes from old to new cluster
- We'll often leave one node running the old cluster for a short period to
allow us to roll back if necessary.
So, a lot of this is depends on how you're ingesting data into your cluster
and if all your data can fit on a subset of nodes.
Hope this helps,
On Friday, October 12, 2012 11:25:00 AM UTC-6, Jérôme Gagnon wrote:
I was just wondering how you guys proceed to upgrade ElasticSearch
versions ? I'm thinking about having 2 identical clusters and upgrading one
at a time to avoid downtime... since the protocol is binary you can't
really just stop 1 node at a time and restart it, i'm I right ? i just
wanted to know how to do it, I saw a method somewhere involving firewall
and things like that, but I don't want to go there yet. I want to have 0
downtime as you may have guessed